Awkward Array features that are supported in Numba-compiled functions#
See the Numba documentation, which maintains lists of
in JIT-compiled functions. This page describes the supported Awkward Array library features.
import awkward as ak
import numpy as np
import numba as nb
Passing Awkward Arrays as arguments to a function#
The main use is to pass an Awkward Array into a function that has been JIT-compiled by Numba. As many arguments as you want can be Awkward Arrays, and they don’t have to have the same length or shape.
array1 = ak.Array([[0, 1.1, 2.2], [], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]])
array2 = ak.Array([
[{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [1, 2]}, {"x": 3.3, "y": [1, 2, 3]}],
[],
[{"x": 4.4, "y": [1, 2, 3, 4]}, {"x": 5.5, "y": [1, 2, 3, 4, 5]}]
])
@nb.jit
def first_array(array):
for i, list_of_numbers in enumerate(array):
for x in list_of_numbers:
if x == 3.3:
return i
@nb.jit
def second_array(array):
for i, list_of_records in enumerate(array):
for record in list_of_records:
if record.x == 3.3:
return i
@nb.jit
def where_is_3_point_3(a, b):
return first_array(a), second_array(b)
where_is_3_point_3(array1, array2)
(2, 0)
The only constraint is that union types can’t be accessed within the compiled function. (Heterogeneous parts of an array can be ignored and passed through a compiled function.)
Returning Awkward Arrays from a function#
Parts of the input array can be returned from a compiled function.
@nb.jit
def first_array(array):
for list_of_numbers in array:
for x in list_of_numbers:
if x == 3.3:
return list_of_numbers
@nb.jit
def second_array(array):
for list_of_records in array:
for record in list_of_records:
if record.x == 3.3:
return record
@nb.jit
def find_3_point_3(a, b):
return first_array(a), second_array(b)
found_a, found_b = find_3_point_3(array1, array2)
found_a
[3.3, 4.4] ----------------- backend: cpu nbytes: 16 B type: 2 * float64
found_b
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/IPython/core/formatters.py:1036, in MimeBundleFormatter.__call__(self, obj, include, exclude)
1033 method = get_real_method(obj, self.print_method)
1035 if method is not None:
-> 1036 return method(include=include, exclude=exclude)
1037 return None
1038 else:
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/highlevel.py:2415, in Record._repr_mimebundle_(self, include, exclude)
2409 def _repr_mimebundle_(self, include=None, exclude=None):
2410 # order:
2411 # first: array,
2412 # last: type,
2413 # middle: rest sorted by length of prefix (longest first)
-> 2415 rows = highlevel_array_show_rows(
2416 array=self,
2417 type=True,
2418 named_axis=True,
2419 nbytes=True,
2420 backend=True,
2421 )
2422 header_lines = rows.pop(0).removesuffix("\n").splitlines()
2424 # it's always the second row (after the array)
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/prettyprint.py:490, in highlevel_array_show_rows(array, limit_rows, limit_cols, type, named_axis, nbytes, backend, formatter, precision)
488 rows.append(named_axis_line)
489 if nbytes:
--> 490 nbytes_line = f"nbytes: {bytes_repr(array.nbytes)}"
491 rows.append(nbytes_line)
492 if backend:
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/highlevel.py:2223, in Record.__getattr__(self, where)
2195 """
2196 Whenever possible, fields can be accessed as attributes.
2197
(...)
2220 keyword.
2221 """
2222 if hasattr(type(self), where):
-> 2223 return super().__getattribute__(where)
2224 else:
2225 if where in self._layout.fields:
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/highlevel.py:2035, in Record.nbytes(self)
2024 @property
2025 def nbytes(self):
2026 """
2027 The total number of bytes in all the #ak.index.Index,
2028 and #ak.contents.NumpyArray buffers in this array tree.
(...)
2033 array buffers.
2034 """
-> 2035 return self._layout.nbytes
AttributeError: 'Record' object has no attribute 'nbytes'
<Record {x: 3.3, y: [1, ..., 3]} type='{x: float64, y: var * int64}'>
Cannot use ak.*
functions or ufuncs#
Outside of a compiled function, Awkward’s vectorized ak.*
functions and NumPy’s universal functions (ufuncs) should be highly preferred over for-loop iteration because they are much faster.
Inside of a compiled function, however, they can’t be used at all. Use for-loops and if-statements instead.
This is an either-or choice at the boundary of a @nb.jit
-compiled function. (Even if ak.*
had been implemented in Numba’s compiled context, it would be slower than compiled for-loops and if-statements because of the intermediate arrays they would necessarily create.)
Cannot use fancy slicing#
Similarly, any slicing other than
a single integer, like
array[i]
wherei
is an integer, ora single record field as a constant, literal string, like
array["x"]
orarray.x
,
is not allowed. Unpack the data structures one level at a time.
Casting one-dimensional arrays as NumPy#
One-dimensional Awkward Arrays of numbers, which are completely equivalent to NumPy arrays, can be cast as NumPy arrays within the compiled function.
@nb.jit
def return_last_y_list_squared(array):
y_list_squared = None
for list_of_records in array:
for record in list_of_records:
y_list_squared = np.asarray(record.y)**2
return y_list_squared
return_last_y_list_squared(array2)
array([ 1, 4, 9, 16, 25])
This ability to cast Awkward Arrays as NumPy arrays, and then use NumPy’s ufuncs or fancy slicing, softens the law against vectorized functions in the compiled context. (However, making intermediate NumPy arrays is just as bad as making intermediate Awkward Arrays.
Creating new arrays with ak.ArrayBuilder
#
Numba can create NumPy arrays inside a compiled function and return them as NumPy arrays in Python, but Awkward Arrays are more complex and this is not possible. (Aside from implementation, what would be the interface? Data in Numba’s compiled context must be fully typed, and Awkward Array types are complex.)
Instead, arrays can be built with ak.ArrayBuilder
, which can be used in compiled contexts and discovers type dynamically. Each ak.ArrayBuilder
must be instantiated outside of a compiled function and passed in, and then its ak.ArrayBuilder.snapshot()
(which creates the ak.Array
) must be called outside of the compiled function, like this:
@nb.jit
def create_ragged_array(builder, n):
for i in range(n):
builder.begin_list()
for j in range(i):
builder.integer(j)
builder.end_list()
return builder
builder = ak.ArrayBuilder()
create_ragged_array(builder, 10)
array = builder.snapshot()
array
[[], [0], [0, 1], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7, 8]] ----------------------------- backend: cpu nbytes: 448 B type: 10 * var * int64
or, more succintly,
create_ragged_array(ak.ArrayBuilder(), 10).snapshot()
[[], [0], [0, 1], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7, 8]] ----------------------------- backend: cpu nbytes: 448 B type: 10 * var * int64
Note that we didn’t need to specify that the type of the data would be var * int64
; this was determined by the way that ak.ArrayBuilder
was called: ak.ArrayBuilder.integer()
was only ever called between ak.ArrayBuilder.begin_list()
and ak.ArrayBuilder.end_list()
, and hence the type is var * int64
.
Note that ak.ArrayBuilder
can be used outside of compiled functions, too, so it can be tested interactively:
with builder.record():
builder.field("x").real(3.14)
with builder.field("y").list():
builder.string("one")
builder.string("two")
builder.string("three")
builder.snapshot()
[[], [0], [0, 1], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7, 8], {x: 3.14, y: ['one', 'two', 'three']}] -------------------------------------------------------------------------------------------- backend: cpu nbytes: 614 B type: 11 * union[ var * int64, { x: float64, y: var * string } ]
But the context managers, with builder.record()
and with builder.list()
, don’t work in Numba-compiled functions because Numba does not yet support it as a language feature.
Overriding behavior with ak.behavior
#
Just as behaviors can be customized for Awkward Arrays in general, they can be customized in the compiled context as well. See the last section of the ak.behavior
reference for details.