How to convert to/from Python objects#

Builtin Python objects like dicts and lists can be converted into Awkward Arrays, and all Awkward Arrays can be converted into Python objects. Awkward type information, such as the distinction between fixed-size and variable-length lists, is lost in the transformation to Python objects.

import awkward as ak
import numpy as np
import pandas as pd

From Python to Awkward#

The function for Python → Awkward conversion is ak.from_iter().

py_objects = [[1.1, 2.2, 3.3], [], [4.4, 5.5]]
py_objects
[[1.1, 2.2, 3.3], [], [4.4, 5.5]]
ak_array = ak.from_iter(py_objects)
ak_array
[[1.1, 2.2, 3.3],
 [],
 [4.4, 5.5]]
-----------------------
backend: cpu
nbytes: 72 B
type: 3 * var * float64

See the sections below for how Python types are mapped to Awkward types.

Note that this should be considered a slow, memory-intensive function: not only does it need to iterate over Python data, but it needs to discover the type of the data progressively. Internally, this function uses an ak.ArrayBuilder to accumulate data and discover types simultaneously. Don’t, for instance, convert a large, numerical dataset from NumPy or Arrow into Python objects just to use ak.from_iter(). There are specialized functions for that: see their tutorials (left-bar or ≡ button on mobile).

This is also the fallback operation of the ak.Array and ak.Record constructors. Usually, small examples are built by passing Python objects directly to these constructors.

ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
[[1.1, 2.2, 3.3],
 [],
 [4.4, 5.5]]
-----------------------
backend: cpu
nbytes: 72 B
type: 3 * var * float64
ak.Record({"x": 1, "y": [1.1, 2.2]})
{x: 1,
 y: [1.1, 2.2]}
------------------------------------------
backend: cpu
nbytes: 24 B
type: {
    x: int64,
    y: 2 * float64
}

From Awkward to Python#

The function for Awkward → Python conversion is ak.to_list().

ak_array = ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
ak_array
[[1.1, 2.2, 3.3],
 [],
 [4.4, 5.5]]
-----------------------
backend: cpu
nbytes: 72 B
type: 3 * var * float64
ak.to_list(ak_array)
[[1.1, 2.2, 3.3], [], [4.4, 5.5]]
ak_record = ak.Record({"x": 1, "y": [1.1, 2.2]})
ak_record
{x: 1,
 y: [1.1, 2.2]}
------------------------------------------
backend: cpu
nbytes: 24 B
type: {
    x: int64,
    y: 2 * float64
}
ak.to_list(ak_record)
{'x': 1, 'y': [1.1, 2.2]}

Note that this should be considered a slow, memory-intensive function, like ak.from_iter(). Don’t, for instance, convert a large, numerical dataset with ak.to_list() just to convert those lists into NumPy or Arrow. There are specialized functions for that: see their tutorials (left-bar or ≡ button on mobile).

Awkward Arrays and Records have a to_list method. For small datasets (or a small slice of a dataset), this is a convenient way to get a quick view.

x = ak.Array(np.arange(1000))
y = ak.Array(np.tile(np.array([0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9]), 100))
ak_array = ak.zip({"x": x, "y": y})
ak_array
[{x: 0, y: 0},
 {x: 1, y: 1.1},
 {x: 2, y: 2.2},
 {x: 3, y: 3.3},
 {x: 4, y: 4.4},
 {x: 5, y: 5.5},
 {x: 6, y: 6.6},
 {x: 7, y: 7.7},
 {x: 8, y: 8.8},
 {x: 9, y: 9.9},
 ...,
 {x: 991, y: 1.1},
 {x: 992, y: 2.2},
 {x: 993, y: 3.3},
 {x: 994, y: 4.4},
 {x: 995, y: 5.5},
 {x: 996, y: 6.6},
 {x: 997, y: 7.7},
 {x: 998, y: 8.8},
 {x: 999, y: 9.9}]
---------------------------------------------
backend: cpu
nbytes: 16.0 kB
type: 1000 * {
    x: int64,
    y: float64
}
ak_array[100].to_list()
{'x': 100, 'y': 0.0}
ak_array[100:110].to_list()
[{'x': 100, 'y': 0.0},
 {'x': 101, 'y': 1.1},
 {'x': 102, 'y': 2.2},
 {'x': 103, 'y': 3.3},
 {'x': 104, 'y': 4.4},
 {'x': 105, 'y': 5.5},
 {'x': 106, 'y': 6.6},
 {'x': 107, 'y': 7.7},
 {'x': 108, 'y': 8.8},
 {'x': 109, 'y': 9.9}]

Pandas-style constructor#

As we have seen, the ak.Array) constructor interprets an iterable argument as the data that it is meant to represent, as in:

py_objects1 = [[1.1, 2.2, 3.3], [], [4.4, 5.5]]
py_objects1
[[1.1, 2.2, 3.3], [], [4.4, 5.5]]
ak.Array(py_objects1)
[[1.1, 2.2, 3.3],
 [],
 [4.4, 5.5]]
-----------------------
backend: cpu
nbytes: 72 B
type: 3 * var * float64

But sometimes, you have several iterables that you want to use as columns of a table. The Pandas DataFrame constructor interprets a dict of iterables as columns:

py_objects2 = ["one", "two", "three"]
py_objects2
['one', 'two', 'three']
pd.DataFrame({"x": py_objects1, "y": py_objects2})
x y
0 [1.1, 2.2, 3.3] one
1 [] two
2 [4.4, 5.5] three

And so does the ak.Array constructor:

ak_array = ak.Array({"x": py_objects1, "y": py_objects2})
ak_array
[{x: [1.1, 2.2, 3.3], y: 'one'},
 {x: [], y: 'two'},
 {x: [4.4, 5.5], y: 'three'}]
-------------------------------------------------
backend: cpu
nbytes: 115 B
type: 3 * {
    x: var * float64,
    y: string
}
ak.type(ak_array)
ArrayType(RecordType([ListType(NumpyType('float64')), ListType(NumpyType('uint8', parameters={'__array__': 'char'}), parameters={'__array__': 'string'})], ['x', 'y']), 3, None)
ak.to_list(ak_array)
[{'x': [1.1, 2.2, 3.3], 'y': 'one'},
 {'x': [], 'y': 'two'},
 {'x': [4.4, 5.5], 'y': 'three'}]

Note that this is the transpose of the way the data would be interpreted if it were in a list, rather than a dict. The "x" and "y" values are interpreted as being interleaved in each record. There is no potential for conflict between the ak.from_iter()-style and Pandas-style constructors because ak.from_iter() applied to a dict would always return an ak.Record, rather than an ak.Array.

ak_record = ak.from_iter({"x": py_objects1, "y": py_objects2})
ak_record
{x: [[1.1, 2.2, 3.3], [], [4.4, 5.5]],
 y: ['one', 'two', 'three']}
---------------------------------------------------------
backend: cpu
nbytes: 147 B
type: {
    x: var * var * float64,
    y: var * string
}
ak.type(ak_record)
ScalarType(RecordType([ListType(ListType(NumpyType('float64'))), ListType(ListType(NumpyType('uint8', parameters={'__array__': 'char'}), parameters={'__array__': 'string'}))], ['x', 'y']), None)
ak.to_list(ak_record)
{'x': [[1.1, 2.2, 3.3], [], [4.4, 5.5]], 'y': ['one', 'two', 'three']}

The ak.from_iter() function applied to a dict is also equivalent to the ak.Record constructor.

ak.Record({"x": py_objects1, "y": py_objects2})
{x: [[1.1, 2.2, 3.3], [], [4.4, 5.5]],
 y: ['one', 'two', 'three']}
-----------------------------------------------------
backend: cpu
nbytes: 115 B
type: {
    x: 3 * var * float64,
    y: 3 * string
}

Conversion of numbers and booleans#

Python float, int, and bool (so-called “primitive” types) are converted to float64, int64, and bool types in Awkward Arrays.

All floating-point Awkward types are converted to Python’s float, all integral Awkward types are converted to Python’s int, and Awkward’s boolean type is converted to Python’s bool.

ak.Array([1.1, 2.2, 3.3])
[1.1,
 2.2,
 3.3]
-----------------
backend: cpu
nbytes: 24 B
type: 3 * float64
ak.Array([1.1, 2.2, 3.3]).to_list()
[1.1, 2.2, 3.3]
ak.Array([1, 2, 3, 4, 5])
[1,
 2,
 3,
 4,
 5]
---------------
backend: cpu
nbytes: 40 B
type: 5 * int64
ak.Array([1, 2, 3, 4, 5]).to_list()
[1, 2, 3, 4, 5]
ak.Array([True, False, True, False, False])
[True,
 False,
 True,
 False,
 False]
--------------
backend: cpu
nbytes: 5 B
type: 5 * bool
ak.Array([True, False, True, False, False]).to_list()
[True, False, True, False, False]

Conversion of lists#

Python lists, as well as iterables other than dict, tuple, str, and bytes, are converted to Awkward’s variable-length lists. It is not possible to construct fixed-size lists with ak.from_iter(). (One way to do that is by converting a NumPy array with ak.from_numpy().)

Awkward’s variable-length and fixed-size lists are converted into Python lists with ak.to_list().

ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
[[1.1, 2.2, 3.3],
 [],
 [4.4, 5.5]]
-----------------------
backend: cpu
nbytes: 72 B
type: 3 * var * float64
ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]]).to_list()
[[1.1, 2.2, 3.3], [], [4.4, 5.5]]
ak.Array([[1, 2, 3], [4, 5, 6]])
[[1, 2, 3],
 [4, 5, 6]]
---------------------
backend: cpu
nbytes: 72 B
type: 2 * var * int64
ak.Array([[1, 2, 3], [4, 5, 6]]).to_list()
[[1, 2, 3], [4, 5, 6]]

Note

Advanced topic: the rest of this section may be skipped if you don’t care about the distinction between fixed-size and variable-length lists.

Note that a NumPy array is an iterable, so ak.from_iter() iterates over it, constructing variable-length Awkward lists. By contrast, ak.from_numpy() casts the data (without iteration) into fixed-size Awkward lists.

np_array = np.array([[100, 200], [101, 201], [103, 203]])
np_array
array([[100, 200],
       [101, 201],
       [103, 203]])
ak.from_iter(np_array)
[[100, 200],
 [101, 201],
 [103, 203]]
---------------------
backend: cpu
nbytes: 80 B
type: 3 * var * int64
ak.from_numpy(np_array)
[[100, 200],
 [101, 201],
 [103, 203]]
-------------------
backend: cpu
nbytes: 48 B
type: 3 * 2 * int64

Note that the types differ: var * int64 vs 2 * int64. The ak.Array constructor uses ak.from_numpy() if given a NumPy array (with dtype != "O") and ak.from_iter() if given an iterable that it does not recognize.

This can be particularly subtle when NumPy arrays are nested within iterables.

np_array = np.array([[100, 200], [101, 201], [103, 203]])
np_array
array([[100, 200],
       [101, 201],
       [103, 203]])
# This is a NumPy array: constructor uses ak.from_numpy to get an array of fixed-size lists.
ak.Array(np_array)
[[100, 200],
 [101, 201],
 [103, 203]]
-------------------
backend: cpu
nbytes: 48 B
type: 3 * 2 * int64
py_objects = [np.array([100, 200]), np.array([101, 201]), np.array([103, 203])]
py_objects
[array([100, 200]), array([101, 201]), array([103, 203])]

This is a list that contains NumPy arrays: constructor uses ak.from_iter to get an array of variable-length lists.

ak.Array(py_objects)
[[100, 200],
 [101, 201],
 [103, 203]]
---------------------
backend: cpu
nbytes: 80 B
type: 3 * var * int64
np_array_dtype_O = np.array([[100, 200], [101, 201], [103, 203]], dtype="O")
np_array_dtype_O
array([[100, 200],
       [101, 201],
       [103, 203]], dtype=object)

This NumPy array has dtype=”O”, so it cannot be cast without iteration:

ak.Array(np_array_dtype_O)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[42], line 1
----> 1 ak.Array(np_array_dtype_O)

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/highlevel.py:323, in Array.__init__(self, data, behavior, with_name, check_valid, backend, attrs, named_axis)
    320     layout = ak.operations.from_json(data, highlevel=False)
    322 else:
--> 323     layout = ak.operations.to_layout(
    324         data, allow_record=False, regulararray=False, primitive_policy="error"
    325     )
    327 if not isinstance(layout, ak.contents.Content):
    328     raise TypeError("could not convert data into an ak.Array")

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_dispatch.py:38, in named_high_level_function.<locals>.dispatch(*args, **kwargs)
     35 @wraps(func)
     36 def dispatch(*args, **kwargs):
     37     # NOTE: this decorator assumes that the operation is exposed under `ak.`
---> 38     with OperationErrorContext(name, args, kwargs):
     39         gen_or_result = func(*args, **kwargs)
     40         if isgenerator(gen_or_result):

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_errors.py:80, in ErrorContext.__exit__(self, exception_type, exception_value, traceback)
     78     self._slate.__dict__.clear()
     79     # Handle caught exception
---> 80     raise self.decorate_exception(exception_type, exception_value)
     81 else:
     82     # Step out of the way so that another ErrorContext can become primary.
     83     if self.primary() is self:

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_dispatch.py:64, in named_high_level_function.<locals>.dispatch(*args, **kwargs)
     62 # Failed to find a custom overload, so resume the original function
     63 try:
---> 64     next(gen_or_result)
     65 except StopIteration as err:
     66     return err.value

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_to_layout.py:80, in to_layout(array, allow_record, allow_unknown, none_policy, use_from_iter, primitive_policy, string_policy, regulararray)
     77 yield (array,)
     79 # Implementation
---> 80 return _impl(
     81     array,
     82     allow_record,
     83     allow_unknown,
     84     none_policy,
     85     regulararray,
     86     use_from_iter,
     87     primitive_policy,
     88     string_policy,
     89 )

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_to_layout.py:177, in _impl(obj, allow_record, allow_unknown, none_policy, regulararray, use_from_iter, primitive_policy, string_policy)
    175     return obj.snapshot()
    176 elif numpy.is_own_array(obj):
--> 177     promoted_layout = ak.operations.from_numpy(
    178         obj,
    179         regulararray=regulararray,
    180         recordarray=True,
    181         highlevel=False,
    182         primitive_policy=primitive_policy,
    183     )
    184     return _handle_array_like(
    185         obj, promoted_layout, primitive_policy=primitive_policy
    186     )
    187 elif Cupy.is_own_array(obj):

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_dispatch.py:39, in named_high_level_function.<locals>.dispatch(*args, **kwargs)
     35 @wraps(func)
     36 def dispatch(*args, **kwargs):
     37     # NOTE: this decorator assumes that the operation is exposed under `ak.`
     38     with OperationErrorContext(name, args, kwargs):
---> 39         gen_or_result = func(*args, **kwargs)
     40         if isgenerator(gen_or_result):
     41             array_likes = next(gen_or_result)

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_from_numpy.py:56, in from_numpy(array, regulararray, recordarray, highlevel, behavior, primitive_policy, attrs)
     11 @high_level_function()
     12 def from_numpy(
     13     array,
   (...)
     20     attrs=None,
     21 ):
     22     """
     23     Args:
     24         array (np.ndarray): The NumPy array to convert into an Awkward Array.
   (...)
     53     See also #ak.to_numpy and #ak.from_cupy.
     54     """
     55     return wrap_layout(
---> 56         from_arraylib(
     57             array, regulararray, recordarray, primitive_policy=primitive_policy
     58         ),
     59         highlevel=highlevel,
     60         behavior=behavior,
     61     )

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_layout.py:367, in from_arraylib(array, regulararray, recordarray, primitive_policy)
    364         return ByteMaskedArray(Index8(mask), data, valid_when=False)
    366 if array.dtype == np.dtype("O"):
--> 367     raise TypeError("Awkward Array does not support arrays with object dtypes.")
    369 if primitive_policy == "error" and array.ndim == 0:
    370     raise TypeError(
    371         f"Encountered a scalar ({type(array).__name__}), but scalar conversion/promotion is disabled"
    372     )

TypeError: Awkward Array does not support arrays with object dtypes.

This error occurred while calling

    ak.to_layout(
        numpy.ndarray([[100 200]  [101 201]  [103 203]])
        allow_record = False
        regulararray = False
        primitive_policy = 'error'
    )

ak.Array knows that this is a NumPy array, but Awkward does not support object dtypes. Instead, one must use from_iter to tell Awkward that this iteration is intentional.

ak.from_iter(np_array_dtype_O)
[[100, 200],
 [101, 201],
 [103, 203]]
---------------------
backend: cpu
nbytes: 80 B
type: 3 * var * int64

The logic behind this policy is that only NumPy arrays with dtype != "O" are guaranteed to have fixed-size contents. Other cases must have var type lists.

py_objects = [np.array([1.1, 2.2, 3.3]), np.array([]), np.array([4.4, 5.5])]
py_objects
[array([1.1, 2.2, 3.3]), array([], dtype=float64), array([4.4, 5.5])]
ak.Array(py_objects)
[[1.1, 2.2, 3.3],
 [],
 [4.4, 5.5]]
-----------------------
backend: cpu
nbytes: 72 B
type: 3 * var * float64
np_array_dtype_O = np.array([[1.1, 2.2, 3.3], [], [4.4, 5.5]], dtype="O")
np_array_dtype_O
array([list([1.1, 2.2, 3.3]), list([]), list([4.4, 5.5])], dtype=object)
ak.from_iter(np_array_dtype_O)
[[1.1, 2.2, 3.3],
 [],
 [4.4, 5.5]]
-----------------------
backend: cpu
nbytes: 72 B
type: 3 * var * float64

Conversion of strings and bytestrings#

Python strings (type str) are converted to and from Awkward’s UTF-8 encoded strings and Python bytestrings (type bytes) are converted to and from Awkward’s unencoded bytestrings.

ak.Array(["one", "two", "three", "four"])
['one',
 'two',
 'three',
 'four']
----------------
backend: cpu
nbytes: 55 B
type: 4 * string
ak.Array(["one", "two", "three", "four"]).to_list()
['one', 'two', 'three', 'four']
ak.Array([b"one", b"two", b"three", b"four"])
[b'one',
 b'two',
 b'three',
 b'four']
---------------
backend: cpu
nbytes: 55 B
type: 4 * bytes
ak.Array([b"one", b"two", b"three", b"four"]).to_list()
[b'one', b'two', b'three', b'four']

Note

Advanced topic: the rest of this section may be skipped if you don’t care about internal representations.

Awkward’s strings and bytestrings are specializations of variable-length lists. Whereas a list might be internally represented by a ak.contents.ListArray or a ak.contents.ListOffsetArray,

ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]]).layout
<ListOffsetArray len='3'>
    <offsets><Index dtype='int64' len='4'>
        [0 3 3 5]
    </Index></offsets>
    <content><NumpyArray dtype='float64' len='5'>[1.1 2.2 3.3 4.4 5.5]</NumpyArray></content>
</ListOffsetArray>

Strings and bytestrings are just ak.contents.ListArrays and ak.contents.ListOffsetArrays of one-byte integers with special parameters:

ak.Array(["one", "two", "three", "four"]).layout
<ListOffsetArray len='4'>
    <parameter name='__array__'>'string'</parameter>
    <offsets><Index dtype='int64' len='5'>
        [ 0  3  6 11 15]
    </Index></offsets>
    <content><NumpyArray dtype='uint8' len='15'>
        <parameter name='__array__'>'char'</parameter>
        [111 110 101 116 119 111 116 104 114 101 101 102 111 117 114]
    </NumpyArray></content>
</ListOffsetArray>

These parameters indicate that the arrays of strings should have special behaviors, such as equality-per-string, rather than equality-per-character.

ak.Array([[1.1, 2.2], [], [3.3]]) == ak.Array([[1.1, 200], [], [3.3]])
[[True, False],
 [],
 [True]]
--------------------
backend: cpu
nbytes: 35 B
type: 3 * var * bool
ak.Array(["one", "two", "three", "four"]) == ak.Array(
    ["one", "TWO", "thirty three", "four"]
)
[True,
 False,
 False,
 True]
--------------
backend: cpu
nbytes: 4 B
type: 4 * bool

(Without this overloaded behavior, the string comparison would yield [True, True, True] for "one" == "one" and would fail to broadcast "three" and "thirty three".)

The fact that strings are really just variable-length lists is worth keeping in mind, since they might behave in unexpectedly list-like ways. If you notice any behavior that ought to be overloded for strings, recommend it as a feature request.

Conversion of dicts and tuples#

Python dicts with string-valued keys are converted to and from Awkward’s record type with named fields. The data associated with different fields can have different types, but you generally want data associated with all instances of the same field to have the same type. Python dicts with non-string valued keys have no equivalent in Awkward Array (records are very different from mappings).

Python tuples are converted to and from Awkward’s record type with unnamed fields. Note that Awkward views Python’s lists and tuples in very different ways: lists are expected to be variable-length with all elements having the same type, while tuples are expected to be fixed-size with elements having potentially different types, just like a record.

In the following example, the "x" field has type int64 and the "y" field has type var * int64.

ak_array_rec = ak.Array([{"x": 1, "y": [1, 2]}, {"x": 2, "y": []}])
ak_array_rec
[{x: 1, y: [1, 2]},
 {x: 2, y: []}]
----------------------------------------------
backend: cpu
nbytes: 56 B
type: 2 * {
    x: int64,
    y: var * int64
}
ak_array_rec.to_list()
[{'x': 1, 'y': [1, 2]}, {'x': 2, 'y': []}]

Here is the corresponding example with tuples:

ak_array_tup = ak.Array([(1, [1, 2]), (2, [])])
ak_array_tup
[(1, [1, 2]),
 (2, [])]
----------------------------------------
backend: cpu
nbytes: 56 B
type: 2 * (
    int64,
    var * int64
)
ak_array_tup.to_list()
[(1, [1, 2]), (2, [])]

Both of these Awkward types, {"x": int64, "y": var * int64} and (int64, var * int64), have two fields, but the first one has names for those fields.

Both can be extracted using strings between square brackets, though the strings must be "0" and "1" for the tuple.

ak_array_rec["y"]
[[1, 2],
 []]
---------------------
backend: cpu
nbytes: 40 B
type: 2 * var * int64
ak_array_rec["y", 1]
---------------
backend: cpu
nbytes: 0 B
type: 0 * int64
ak_array_tup["1"]
[[1, 2],
 []]
---------------------
backend: cpu
nbytes: 40 B
type: 2 * var * int64
ak_array_tup["1", 1]
---------------
backend: cpu
nbytes: 0 B
type: 0 * int64

Note the difference in meaning between the "1" and the 1 in the above example. For safety, you may want to use ak.unzip()

x, y = ak.unzip(ak_array_rec)
y
[[1, 2],
 []]
---------------------
backend: cpu
nbytes: 40 B
type: 2 * var * int64
slot0, slot1 = ak.unzip(ak_array_tup)
slot1
[[1, 2],
 []]
---------------------
backend: cpu
nbytes: 40 B
type: 2 * var * int64

That way, you can name the variables anything you like.

If fields are missing from some records, the missing values are filled in with None (option type: more on that below).

ak.Array([{"x": 1, "y": [1, 2]}, {"x": 2}])
[{x: 1, y: [1, 2]},
 {x: 2, y: None}]
------------------------------------------------------
backend: cpu
nbytes: 64 B
type: 2 * {
    x: int64,
    y: option[var * int64]
}

If some tuples have different lengths, the resulting Awkward Array is taken to be heterogeneous (union type: more on that below).

ak.Array([(1, [1, 2]), (2,)])
[(1, [1, 2]),
 (2)]
--------------------------------------------------------------------------------------------
backend: cpu
nbytes: 66 B
type: 2 * union[
    (
        int64,
        var * int64
    ),
    (
        int64
    )
]

An Awkward Record is a scalar drawn from a record array, so an ak.Record can be built from a single dict with string-valued keys.

ak.Record({"x": 1, "y": [1, 2], "z": 3.3})
{x: 1,
 y: [1, 2],
 z: 3.3}
--------------------------------------------------------
backend: cpu
nbytes: 32 B
type: {
    x: int64,
    y: 2 * int64,
    z: float64
}

The same is not true for tuples. The ak.Record constructor expects named fields.

ak.Record((1, [1, 2], 3.3))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[69], line 1
----> 1 ak.Record((1, [1, 2], 3.3))

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/highlevel.py:1853, in Record.__init__(self, data, behavior, with_name, check_valid, backend, attrs, named_axis)
   1850     layout = ak.record.Record(ak.contents.RecordArray(contents, fields), at=0)
   1852 elif isinstance(data, Iterable):
-> 1853     raise TypeError(
   1854         "could not convert non-dict into an ak.Record; try ak.Array"
   1855     )
   1857 else:
   1858     layout = None

TypeError: could not convert non-dict into an ak.Record; try ak.Array

Missing values: Python None#

Python’s None can appear anywhere in the structure parsed by ak.from_iter(). It makes all data at that level of nesting have option type and is represented in ak.to_list() as None.

ak.Array([1.1, 2.2, None, 3.3, None, 4.4])
[1.1,
 2.2,
 None,
 3.3,
 None,
 4.4]
------------------
backend: cpu
nbytes: 80 B
type: 6 * ?float64
ak.Array([1.1, 2.2, None, 3.3, None, 4.4]).to_list()
[1.1, 2.2, None, 3.3, None, 4.4]

Note

Advanced topic: the rest of this section describes the equivalence of missing record fields and record fields with None values, which is only relevant to datasets with missing fields.

As described above, fields that are absent from some records but not others are filled in with None. As a consequence, conversions from Python to Awkward Array back to Python don’t necessarily result in the original expression:

ak.Array(
    [
        {"x": 1.1, "y": [1]},
        {"x": 2.2, "z": "two"},
        {"x": 3.3, "y": [1, 2, 3], "z": "three"},
    ]
).to_list()
[{'x': 1.1, 'y': [1], 'z': None},
 {'x': 2.2, 'y': None, 'z': 'two'},
 {'x': 3.3, 'y': [1, 2, 3], 'z': 'three'}]

This is a deliberate choice. It would have been possible to convert records with missing fields into arrays with union type (more on that below), for which ak.to_list would result in the original expression,

ak.concatenate(
    [
        ak.Array([{"x": 1.1, "y": [1]}]),
        ak.Array([{"x": 2.2, "z": "two"}]),
        ak.Array([{"x": 3.3, "y": [1, 2, 3], "z": "three"}]),
    ]
).to_list()
[{'x': 1.1, 'y': [1]},
 {'x': 2.2, 'z': 'two'},
 {'x': 3.3, 'y': [1, 2, 3], 'z': 'three'}]

But typical datasets of records with different sets of fields represent missing fields, rather than entirely different types of objects. (Even in particle physics applications that mix “electron objects” with “photon objects,” both types of objects have the same trajectory fields "x", "y", "z" and differ in fields that exist for one and not the other, such as "charge" for electrons but not photons.)

The memory use of union arrays scales with the number of different types, up to $2^n$ for records with $n$ potentially missing fields. Option types of completely disjoint records with $n_1$ and $n_2$ fields use a memory footprint that scales as $n_1 + n_2$. Assuming that disjoint records are a single record type with missing fields is a recoverable mistake, but assuming that a single record type with missing fields are distinct for every combination of missing fields is potentially disastrous.

Tuples of different lengths, on the other hand, are assumed to be different types because mistaking slot $i$ for slot $i + 1$ would create unions anyway.

ak.Array(
    [
        (1.1, [1]),
        (2.2, "two"),
        (3.3, [1, 2, 3], "three"),
    ]
).to_list()
[(1.1, [1]), (2.2, 'two'), (3.3, [1, 2, 3], 'three')]

Union types: heterogeneous data#

If the data in a Python iterable have different types at the same level of nesting (“heterogeneous”), the Awkward Arrays produced by ak.from_iter() have union types.

Most Awkward operations are defined on union typed Arrays, but they’re not generally not as efficient as the same operations on simply typed Arrays.

The following example mixes numbers (float64) with lists (var * int64).

ak.Array([1.1, 2.2, [], [1], [1, 2], 3.3])
[1.1,
 2.2,
 [],
 [1],
 [1, 2],
 3.3]
-----------------------------------------------
backend: cpu
nbytes: 134 B
type: 6 * union[
    float64,
    var * int64
]

The ak.to_list() function converts it back into a heterogeneous Python list.

ak.Array([1.1, 2.2, [], [1], [1, 2], 3.3]).to_list()
[1.1, 2.2, [], [1], [1, 2], 3.3]

Any types may be mixed: numbers and lists, lists and records, missing data, etc.

ak.Array([[1, 2, 3], {"x": 1, "y": 2}, None])
[[1, 2, 3],
 {x: 1, y: 2},
 None]
-------------------------------------------------------------------------------------------
backend: cpu
nbytes: 99 B
type: 3 * union[
    option[var * int64],
    ?{
        x: int64,
        y: int64
    }
]

One exception is that numerical data are merged without creating a union type: integers are expanded to floating point numbers.

ak.Array([1, 2, 3, 4, 5.5, 6.6, 7.7, 8, 9])
[1,
 2,
 3,
 4,
 5.5,
 6.6,
 7.7,
 8,
 9]
-----------------
backend: cpu
nbytes: 72 B
type: 9 * float64

But booleans are not merged with integers.

ak.Array([1, 2, 3, True, True, False, 4, 5])
[1,
 2,
 3,
 True,
 True,
 False,
 4,
 5]
--------------------------------------
backend: cpu
nbytes: 115 B
type: 8 * union[
    int64,
    bool
]

As described above, records with different sets of fields are presumed to be a single record type with missing values.

ak.type(
    ak.Array(
        [
            {"x": 1.1, "y": [1]},
            {"x": 2.2, "z": "two"},
            {"x": 3.3, "y": [1, 2, 3], "z": "three"},
        ]
    )
)
ArrayType(RecordType([NumpyType('float64'), OptionType(ListType(NumpyType('int64'))), OptionType(ListType(NumpyType('uint8', parameters={'__array__': 'char'}), parameters={'__array__': 'string'}))], ['x', 'y', 'z']), 3, None)

But tuples with different lengths are presumed to be distinct types.

ak.type(
    ak.Array(
        [
            (1.1, [1]),
            (2.2, "two"),
            (3.3, [1, 2, 3], "three"),
        ]
    )
)
ArrayType(UnionType([RecordType([NumpyType('float64'), UnionType([ListType(NumpyType('int64')), ListType(NumpyType('uint8', parameters={'__array__': 'char'}), parameters={'__array__': 'string'})])], None), RecordType([NumpyType('float64'), ListType(NumpyType('int64')), ListType(NumpyType('uint8', parameters={'__array__': 'char'}), parameters={'__array__': 'string'})], None)]), 3, None)

More control over conversions#

The conversions described above are applied by ak.from_iter() when it maps data into an ak.ArrayBuilder. For more control over the conversion process (e.g. to make unions of records), use ak.ArrayBuilder directly.