How to ensure that an array is valid#

Awkward Arrays are complex data structures with their own rules for internal consistency. In principle, all data sources should serve valid array structures and all operations on valid structures should return valid structures. However, errors sometimes happen.

Awkward Array’s compiled routines check for validity in the course of computation, so that errors are reported as Python exceptions, rather than undefined behavior or segmentation faults. However, those errors can be hard to understand because the invalid structure might have been constructed much earlier in a program than the point where it is discovered.

For that reason, you have tools to check an Awkward Array’s internal validity: ak.is_valid(), ak.validity_error(), and the check_valid argument to constructors like ak.Array.

import awkward as ak

To demonstrate, here’s a valid array:

array_is_valid = ak.Array([[0, 1, 2], [], [3, 4], [5], [6, 7, 8, 9]])
array_is_valid
[[0, 1, 2],
 [],
 [3, 4],
 [5],
 [6, 7, 8, 9]]
---------------------
type: 5 * var * int64

and here is a copy of it that I will make invalid.

array_is_invalid = ak.copy(array_is_valid)
array_is_invalid.layout
<ListOffsetArray len='5'>
    <offsets><Index dtype='int64' len='6'>
        [ 0  3  3  5  6 10]
    </Index></offsets>
    <content><NumpyArray dtype='int64' len='10'>[0 1 2 3 4 5 6 7 8 9]</NumpyArray></content>
</ListOffsetArray>
array_is_invalid.layout.offsets.data
array([ 0,  3,  3,  5,  6, 10])
array_is_invalid.layout.offsets.data[3] = 100

array_is_invalid.layout
<ListOffsetArray len='5'>
    <offsets><Index dtype='int64' len='6'>
        [  0   3   3 100   6  10]
    </Index></offsets>
    <content><NumpyArray dtype='int64' len='10'>[0 1 2 3 4 5 6 7 8 9]</NumpyArray></content>
</ListOffsetArray>

The ak.is_valid() function only tells us whether an array is valid or not:

ak.is_valid(array_is_valid)
True
ak.is_valid(array_is_invalid)
False

But the ak.validity_error() function tells us what the error was (if any).

ak.validity_error(array_is_valid)
''
ak.validity_error(array_is_invalid)
'at highlevel ("<class \'awkward.contents.listoffsetarray.ListOffsetArray\'>"): stop[i] > len(content) at i=2 (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-43/awkward-cpp/src/cpu-kernels/awkward_ListArray_validity.cpp#L24)'

If you suspect that an array is invalid or becomes invalid in the course of your program, you can either use these functions to check or construct arrays with check_valid=True in the ak.Array constructor.

ak.Array(array_is_valid, check_valid=True)
[[0, 1, 2],
 [],
 [3, 4],
 [5],
 [6, 7, 8, 9]]
---------------------
type: 5 * var * int64
ak.Array(array_is_invalid, check_valid=True)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[12], line 1
----> 1 ak.Array(array_is_invalid, check_valid=True)

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/highlevel.py:365, in Array.__init__(self, data, behavior, with_name, check_valid, backend, attrs, named_axis)
    362 self._update_class()
    364 if check_valid:
--> 365     ak.operations.validity_error(self, exception=True)

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_dispatch.py:38, in named_high_level_function.<locals>.dispatch(*args, **kwargs)
     35 @wraps(func)
     36 def dispatch(*args, **kwargs):
     37     # NOTE: this decorator assumes that the operation is exposed under `ak.`
---> 38     with OperationErrorContext(name, args, kwargs):
     39         gen_or_result = func(*args, **kwargs)
     40         if isgenerator(gen_or_result):

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_errors.py:80, in ErrorContext.__exit__(self, exception_type, exception_value, traceback)
     78     self._slate.__dict__.clear()
     79     # Handle caught exception
---> 80     raise self.decorate_exception(exception_type, exception_value)
     81 else:
     82     # Step out of the way so that another ErrorContext can become primary.
     83     if self.primary() is self:

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_dispatch.py:64, in named_high_level_function.<locals>.dispatch(*args, **kwargs)
     62 # Failed to find a custom overload, so resume the original function
     63 try:
---> 64     next(gen_or_result)
     65 except StopIteration as err:
     66     return err.value

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_validity_error.py:31, in validity_error(array, exception)
     28 yield (array,)
     30 # Implementation
---> 31 return _impl(array, exception)

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_validity_error.py:41, in _impl(array, exception)
     38 out = ak._do.validity_error(layout, path="highlevel")
     40 if out not in (None, "") and exception:
---> 41     raise ValueError(out)
     42 else:
     43     return out

ValueError: at highlevel ("<class 'awkward.contents.listoffsetarray.ListOffsetArray'>"): stop[i] > len(content) at i=2 (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-43/awkward-cpp/src/cpu-kernels/awkward_ListArray_validity.cpp#L24)

This error occurred while calling

    ak.validity_error(
        <Array [[0, 1, 2], [], ..., [], [6, 7, 8, 9]] type='5 * var * int64'>
        exception = True
    )