How to use Awkward Arrays in C++ with cppyy#

Warning

Awkward Array can only work with cppyy 3.1 or later.

Warning

cppyy must be in a different venv or conda environment from ROOT, if you have installed ROOT, because the two packages define modules with conflicting names.

The cppyy is an automatic, run-time, Python-C++ bindings generator, for calling C++ from Python and Python from C++. cppyy is based on the C++ interpreter Cling.

cppyy can understand Awkward Arrays. When an ak.Array type is passed to a C++ function defined in cppyy, a __cast_cpp__ magic function of an ak.Array is invoked. The function dynamically generates a C++ type and a view of the array, if it has not been generated yet.

The view is a lightweight 40-byte C++ object dynamically allocated on the stack. This view is generated on demand - and only once per Awkward Array, the data are not copied.

import awkward as ak
ak.__version__
'2.7.2'
import awkward._connect.cling
import cppyy
cppyy.__version__
(Re-)building pre-compiled headers (options: -O2 -march=native); this may take a minute ...
ERROR: cannot find etc/dictpch/allHeaders.h file here ./etc/dictpch/allHeaders.h nor here etc/dictpch/allHeaders.h
/opt/hostedtoolcache/Python/3.11.0/x64/lib/python3.11/site-packages/cppyy_backend/loader.py:139: UserWarning: No precompiled header available (failed to build); this may impact performance.
  warnings.warn('No precompiled header available (%s); this may impact performance.' % msg)
input_line_10:2:45: error: explicit instantiation of '_M_use_local_data' does not refer to a function template, variable template, member function, member class, or static data member
template std::string::pointer  std::string::_M_use_local_data();
                                            ^
input_line_10:3:46: error: explicit instantiation of '_M_use_local_data' does not refer to a function template, variable template, member function, member class, or static data member
template std::wstring::pointer std::wstring::_M_use_local_data();
                                             ^
'3.1.0'

Let’s define an Awkward Array as a list of records:

array = ak.Array(
    [
        [{"x": 1, "y": [1.1]}, {"x": 2, "y": [2.2, 0.2]}],
        [],
        [{"x": 3, "y": [3.0, 0.3, 3.3]}],
    ]
)
array
[[{x: 1, y: [1.1]}, {x: 2, y: [2.2, 0.2]}],
 [],
 [{x: 3, y: [3, 0.3, 3.3]}]]
------------------------------------------------------
backend: cpu
nbytes: 136 B
type: 3 * var * {
    x: int64,
    y: var * float64
}

This example shows a templated C++ function that takes an Awkward Array and iterates over the list of records:

source_code = """
template<typename T>
double go_fast_cpp(T& awkward_array) {
    double out = 0.0;

    for (auto list : awkward_array) {
        for (auto record : list) {
            for (auto item : record.y()) {
                out += item;
            }
        }
    }

    return out;
}
"""

cppyy.cppdef(source_code)
True

The C++ type of an Awkward Array is a made-up type; awkward::ListArray_hyKwTH3lk1A.

array.cpp_type
'awkward::ListArray_UkUyunNJYms'

Awkward Arrays are dynamically typed, so in a C++ context, the type name is hashed. In practice, there is no need to know the type. The C++ code should use a placeholder type specifier auto. The type of the variable that is being declared will be automatically deduced from its initializer.

In a Python contexts, when a templated function requires a C++ type as a Python string, it can use the ak.Array.cpp_type property:

out = cppyy.gbl.go_fast_cpp[array.cpp_type](array)
%%timeit

out = cppyy.gbl.go_fast_cpp[array.cpp_type](array)
5.35 μs ± 17 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
%%timeit

ak.sum(array["y"])
208 μs ± 6.23 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

But the result is the same.

assert out == ak.sum(array["y"])