{ "cells": [ { "cell_type": "markdown", "id": "1b68bccb-a98b-48c2-889a-10e5c4822eeb", "metadata": { "user_expressions": [] }, "source": [ "# How to use Awkward Arrays in C++ with cppyy" ] }, { "cell_type": "markdown", "id": "f80c4c0d-f4da-4e1e-adb8-90f8482d23e8", "metadata": { "tags": [], "user_expressions": [] }, "source": [ ":::{warning}\n", "\n", "Awkward Array can only work with `cppyy` 3.1 or later.\n", ":::\n", "\n", ":::{warning}\n", "`cppyy` must be in a different venv or conda environment from ROOT, if you have installed ROOT, because the two packages define modules with conflicting names.\n", ":::\n", "\n", "The [cppyy](https://cppyy.readthedocs.io/en/latest/index.html) is an automatic, run-time, Python-C++ bindings generator, for calling C++ from Python and Python from C++. `cppyy` is based on the C++ interpreter `Cling`.\n", "\n", "`cppyy` can understand Awkward Arrays. When an {class}`ak.Array` type is passed to a C++ function defined in `cppyy`, a `__cast_cpp__` magic function of an {class}`ak.Array` is invoked. The function dynamically generates a C++ type and a view of the array, if it has not been generated yet.\n", "\n", "The view is a lightweight 40-byte C++ object dynamically allocated on the stack. This view is generated on demand - and only once per Awkward Array, the data are not copied." ] }, { "cell_type": "code", "execution_count": 1, "id": "48778e8a", "metadata": { "execution": { "iopub.execute_input": "2024-12-18T19:05:04.328991Z", "iopub.status.busy": "2024-12-18T19:05:04.328822Z", "iopub.status.idle": "2024-12-18T19:05:04.542865Z", "shell.execute_reply": "2024-12-18T19:05:04.542254Z" } }, "outputs": [ { "data": { "text/plain": [ "'2.7.2'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import awkward as ak\n", "ak.__version__" ] }, { "cell_type": "code", "execution_count": 2, "id": "dd32d294", "metadata": { "execution": { "iopub.execute_input": "2024-12-18T19:05:04.544980Z", "iopub.status.busy": "2024-12-18T19:05:04.544562Z", "iopub.status.idle": "2024-12-18T19:05:04.548120Z", "shell.execute_reply": "2024-12-18T19:05:04.547693Z" } }, "outputs": [], "source": [ "import awkward._connect.cling" ] }, { "cell_type": "code", "execution_count": 3, "id": "a87d01b8", "metadata": { "execution": { "iopub.execute_input": "2024-12-18T19:05:04.549822Z", "iopub.status.busy": "2024-12-18T19:05:04.549488Z", "iopub.status.idle": "2024-12-18T19:05:05.961997Z", "shell.execute_reply": "2024-12-18T19:05:05.961430Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Re-)building pre-compiled headers (options: -O2 -march=native); this may take a minute ...\n", "ERROR: cannot find etc/dictpch/allHeaders.h file here ./etc/dictpch/allHeaders.h nor here etc/dictpch/allHeaders.h\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/hostedtoolcache/Python/3.11.0/x64/lib/python3.11/site-packages/cppyy_backend/loader.py:139: UserWarning: No precompiled header available (failed to build); this may impact performance.\n", " warnings.warn('No precompiled header available (%s); this may impact performance.' % msg)\n", "input_line_10:2:45: error: explicit instantiation of '_M_use_local_data' does not refer to a function template, variable template, member function, member class, or static data member\n", "template std::string::pointer std::string::_M_use_local_data();\n", " ^\n", "input_line_10:3:46: error: explicit instantiation of '_M_use_local_data' does not refer to a function template, variable template, member function, member class, or static data member\n", "template std::wstring::pointer std::wstring::_M_use_local_data();\n", " ^\n" ] }, { "data": { "text/plain": [ "'3.1.0'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import cppyy\n", "cppyy.__version__" ] }, { "cell_type": "markdown", "id": "39b1877e", "metadata": {}, "source": [ "Let's define an Awkward Array as a list of records:" ] }, { "cell_type": "code", "execution_count": 4, "id": "97f90216", "metadata": { "execution": { "iopub.execute_input": "2024-12-18T19:05:05.964094Z", "iopub.status.busy": "2024-12-18T19:05:05.963718Z", "iopub.status.idle": "2024-12-18T19:05:05.970248Z", "shell.execute_reply": "2024-12-18T19:05:05.969720Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
[[{x: 1, y: [1.1]}, {x: 2, y: [2.2, 0.2]}],\n",
       " [],\n",
       " [{x: 3, y: [3, 0.3, 3.3]}]]\n",
       "------------------------------------------------------\n",
       "backend: cpu\n",
       "nbytes: 136 B\n",
       "type: 3 * var * {\n",
       "    x: int64,\n",
       "    y: var * float64\n",
       "}
" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "array = ak.Array(\n", " [\n", " [{\"x\": 1, \"y\": [1.1]}, {\"x\": 2, \"y\": [2.2, 0.2]}],\n", " [],\n", " [{\"x\": 3, \"y\": [3.0, 0.3, 3.3]}],\n", " ]\n", ")\n", "array" ] }, { "cell_type": "markdown", "id": "b5cc3b3b-8426-4def-96d5-be1314847bc4", "metadata": {}, "source": [ "This example shows a templated C++ function that takes an Awkward Array and iterates over the list of records:" ] }, { "cell_type": "code", "execution_count": 5, "id": "d4294ad6", "metadata": { "execution": { "iopub.execute_input": "2024-12-18T19:05:05.971974Z", "iopub.status.busy": "2024-12-18T19:05:05.971661Z", "iopub.status.idle": "2024-12-18T19:05:05.984531Z", "shell.execute_reply": "2024-12-18T19:05:05.983954Z" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "source_code = \"\"\"\n", "template\n", "double go_fast_cpp(T& awkward_array) {\n", " double out = 0.0;\n", "\n", " for (auto list : awkward_array) {\n", " for (auto record : list) {\n", " for (auto item : record.y()) {\n", " out += item;\n", " }\n", " }\n", " }\n", "\n", " return out;\n", "}\n", "\"\"\"\n", "\n", "cppyy.cppdef(source_code)" ] }, { "cell_type": "markdown", "id": "acecac23-fe0b-485c-8eaf-11c926124217", "metadata": {}, "source": [ "The C++ type of an Awkward Array is a made-up type;\n", "`awkward::ListArray_hyKwTH3lk1A`." ] }, { "cell_type": "code", "execution_count": 6, "id": "03ab8a70", "metadata": { "execution": { "iopub.execute_input": "2024-12-18T19:05:05.986536Z", "iopub.status.busy": "2024-12-18T19:05:05.986204Z", "iopub.status.idle": "2024-12-18T19:05:06.119822Z", "shell.execute_reply": "2024-12-18T19:05:06.119202Z" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'awkward::ListArray_1ZX1N3Hqzeg'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "array.cpp_type" ] }, { "cell_type": "markdown", "id": "240fcb74", "metadata": {}, "source": [ "Awkward Arrays are dynamically typed, so in a C++ context, the type name is hashed. In practice, there is no need to know the type. The C++ code should use a placeholder type specifier `auto`. The type of the variable that is being declared will be automatically deduced from its initializer.\n", "\n", "In a Python contexts, when a templated function requires a C++ type as a Python string, it can use the `ak.Array.cpp_type` property:" ] }, { "cell_type": "code", "execution_count": 7, "id": "0bea9b7d", "metadata": { "execution": { "iopub.execute_input": "2024-12-18T19:05:06.121729Z", "iopub.status.busy": "2024-12-18T19:05:06.121544Z", "iopub.status.idle": "2024-12-18T19:05:06.178732Z", "shell.execute_reply": "2024-12-18T19:05:06.178192Z" } }, "outputs": [], "source": [ "out = cppyy.gbl.go_fast_cpp[array.cpp_type](array)" ] }, { "cell_type": "code", "execution_count": 8, "id": "f0fb71ec", "metadata": { "execution": { "iopub.execute_input": "2024-12-18T19:05:06.180664Z", "iopub.status.busy": "2024-12-18T19:05:06.180352Z", "iopub.status.idle": "2024-12-18T19:05:10.528965Z", "shell.execute_reply": "2024-12-18T19:05:10.528382Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5.3 μs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)\n" ] } ], "source": [ "%%timeit\n", "\n", "out = cppyy.gbl.go_fast_cpp[array.cpp_type](array)" ] }, { "cell_type": "code", "execution_count": 9, "id": "11a7ecec", "metadata": { "execution": { "iopub.execute_input": "2024-12-18T19:05:10.530894Z", "iopub.status.busy": "2024-12-18T19:05:10.530562Z", "iopub.status.idle": "2024-12-18T19:05:12.268111Z", "shell.execute_reply": "2024-12-18T19:05:12.267519Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "213 μs ± 6.33 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n" ] } ], "source": [ "%%timeit\n", "\n", "ak.sum(array[\"y\"])" ] }, { "cell_type": "markdown", "id": "a1d1a195-9b14-4c48-b8af-f4b01e120537", "metadata": {}, "source": [ "But the result is the same." ] }, { "cell_type": "code", "execution_count": 10, "id": "1d23590b", "metadata": { "execution": { "iopub.execute_input": "2024-12-18T19:05:12.270028Z", "iopub.status.busy": "2024-12-18T19:05:12.269710Z", "iopub.status.idle": "2024-12-18T19:05:12.272891Z", "shell.execute_reply": "2024-12-18T19:05:12.272349Z" }, "tags": [] }, "outputs": [], "source": [ "assert out == ak.sum(array[\"y\"])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }