{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1b68bccb-a98b-48c2-889a-10e5c4822eeb",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "# How to use Awkward Arrays in C++ with cppyy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f80c4c0d-f4da-4e1e-adb8-90f8482d23e8",
   "metadata": {
    "tags": [],
    "user_expressions": []
   },
   "source": [
    ":::{warning}\n",
    "\n",
    "Awkward Array can only work with `cppyy` 3.1 or later.\n",
    ":::\n",
    "\n",
    ":::{warning}\n",
    "`cppyy` must be in a different venv or conda environment from ROOT, if you have installed ROOT, because the two packages define modules with conflicting names.\n",
    ":::\n",
    "\n",
    "The [cppyy](https://cppyy.readthedocs.io/en/latest/index.html) is an automatic, run-time, Python-C++ bindings generator, for calling C++ from Python and Python from C++. `cppyy` is based on the C++ interpreter `Cling`.\n",
    "\n",
    "`cppyy` can understand Awkward Arrays. When an {class}`ak.Array` type is passed to a C++ function defined in `cppyy`, a `__cast_cpp__` magic function of an {class}`ak.Array` is invoked. The function dynamically generates a C++ type and a view of the array, if it has not been generated yet.\n",
    "\n",
    "The view is a lightweight 40-byte C++ object dynamically allocated on the stack. This view is generated on demand - and only once per Awkward Array, the data are not copied."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "48778e8a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-03-19T16:21:21.283896Z",
     "iopub.status.busy": "2025-03-19T16:21:21.283733Z",
     "iopub.status.idle": "2025-03-19T16:21:21.493941Z",
     "shell.execute_reply": "2025-03-19T16:21:21.493376Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'2.7.4'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import awkward as ak\n",
    "ak.__version__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "dd32d294",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-03-19T16:21:21.495995Z",
     "iopub.status.busy": "2025-03-19T16:21:21.495604Z",
     "iopub.status.idle": "2025-03-19T16:21:21.499186Z",
     "shell.execute_reply": "2025-03-19T16:21:21.498730Z"
    }
   },
   "outputs": [],
   "source": [
    "import awkward._connect.cling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "a87d01b8",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-03-19T16:21:21.500819Z",
     "iopub.status.busy": "2025-03-19T16:21:21.500504Z",
     "iopub.status.idle": "2025-03-19T16:21:23.305891Z",
     "shell.execute_reply": "2025-03-19T16:21:23.305393Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(Re-)building pre-compiled headers (options: -O2 -march=native); this may take a minute ...\n",
      "ERROR: cannot find etc/dictpch/allHeaders.h file here ./etc/dictpch/allHeaders.h nor here etc/dictpch/allHeaders.h\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/hostedtoolcache/Python/3.11.0/x64/lib/python3.11/site-packages/cppyy_backend/loader.py:139: UserWarning: No precompiled header available (failed to build); this may impact performance.\n",
      "  warnings.warn('No precompiled header available (%s); this may impact performance.' % msg)\n",
      "input_line_10:2:45: error: explicit instantiation of '_M_use_local_data' does not refer to a function template, variable template, member function, member class, or static data member\n",
      "template std::string::pointer  std::string::_M_use_local_data();\n",
      "                                            ^\n",
      "input_line_10:3:46: error: explicit instantiation of '_M_use_local_data' does not refer to a function template, variable template, member function, member class, or static data member\n",
      "template std::wstring::pointer std::wstring::_M_use_local_data();\n",
      "                                             ^\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'3.1.0'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import cppyy\n",
    "cppyy.__version__"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39b1877e",
   "metadata": {},
   "source": [
    "Let's define an Awkward Array as a list of records:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "97f90216",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-03-19T16:21:23.307839Z",
     "iopub.status.busy": "2025-03-19T16:21:23.307449Z",
     "iopub.status.idle": "2025-03-19T16:21:23.314279Z",
     "shell.execute_reply": "2025-03-19T16:21:23.313787Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>[[{x: 1, y: [1.1]}, {x: 2, y: [2.2, 0.2]}],\n",
       " [],\n",
       " [{x: 3, y: [3, 0.3, 3.3]}]]\n",
       "------------------------------------------------------\n",
       "backend: cpu\n",
       "nbytes: 136 B\n",
       "type: 3 * var * {\n",
       "    x: int64,\n",
       "    y: var * float64\n",
       "}</pre>"
      ],
      "text/plain": [
       "<Array [[{x: 1, y: [1.1]}, {...}], ...] type='3 * var * {x: int64, y: var *...'>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "array = ak.Array(\n",
    "    [\n",
    "        [{\"x\": 1, \"y\": [1.1]}, {\"x\": 2, \"y\": [2.2, 0.2]}],\n",
    "        [],\n",
    "        [{\"x\": 3, \"y\": [3.0, 0.3, 3.3]}],\n",
    "    ]\n",
    ")\n",
    "array"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5cc3b3b-8426-4def-96d5-be1314847bc4",
   "metadata": {},
   "source": [
    "This example shows a templated C++ function that takes an Awkward Array and iterates over the list of records:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d4294ad6",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-03-19T16:21:23.316106Z",
     "iopub.status.busy": "2025-03-19T16:21:23.315739Z",
     "iopub.status.idle": "2025-03-19T16:21:23.328214Z",
     "shell.execute_reply": "2025-03-19T16:21:23.327735Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "source_code = \"\"\"\n",
    "template<typename T>\n",
    "double go_fast_cpp(T& awkward_array) {\n",
    "    double out = 0.0;\n",
    "\n",
    "    for (auto list : awkward_array) {\n",
    "        for (auto record : list) {\n",
    "            for (auto item : record.y()) {\n",
    "                out += item;\n",
    "            }\n",
    "        }\n",
    "    }\n",
    "\n",
    "    return out;\n",
    "}\n",
    "\"\"\"\n",
    "\n",
    "cppyy.cppdef(source_code)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acecac23-fe0b-485c-8eaf-11c926124217",
   "metadata": {},
   "source": [
    "The C++ type of an Awkward Array is a made-up type;\n",
    "`awkward::ListArray_hyKwTH3lk1A`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "03ab8a70",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-03-19T16:21:23.329994Z",
     "iopub.status.busy": "2025-03-19T16:21:23.329656Z",
     "iopub.status.idle": "2025-03-19T16:21:23.465291Z",
     "shell.execute_reply": "2025-03-19T16:21:23.464780Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'awkward::ListArray_zjVI12uJA'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "array.cpp_type"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "240fcb74",
   "metadata": {},
   "source": [
    "Awkward Arrays are dynamically typed, so in a C++ context, the type name is hashed. In practice, there is no need to know the type. The C++ code should use a placeholder type specifier `auto`. The type of the variable that is being declared will be automatically deduced from its initializer.\n",
    "\n",
    "In a Python contexts, when a templated function requires a C++ type as a Python string, it can use the `ak.Array.cpp_type` property:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0bea9b7d",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-03-19T16:21:23.467178Z",
     "iopub.status.busy": "2025-03-19T16:21:23.466734Z",
     "iopub.status.idle": "2025-03-19T16:21:23.523220Z",
     "shell.execute_reply": "2025-03-19T16:21:23.522758Z"
    }
   },
   "outputs": [],
   "source": [
    "out = cppyy.gbl.go_fast_cpp[array.cpp_type](array)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "f0fb71ec",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-03-19T16:21:23.525023Z",
     "iopub.status.busy": "2025-03-19T16:21:23.524622Z",
     "iopub.status.idle": "2025-03-19T16:21:27.923752Z",
     "shell.execute_reply": "2025-03-19T16:21:27.923170Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5.38 μs ± 34 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "\n",
    "out = cppyy.gbl.go_fast_cpp[array.cpp_type](array)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "11a7ecec",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-03-19T16:21:27.925623Z",
     "iopub.status.busy": "2025-03-19T16:21:27.925302Z",
     "iopub.status.idle": "2025-03-19T16:21:29.850413Z",
     "shell.execute_reply": "2025-03-19T16:21:29.849942Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "235 μs ± 6.33 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "\n",
    "ak.sum(array[\"y\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1d1a195-9b14-4c48-b8af-f4b01e120537",
   "metadata": {},
   "source": [
    "But the result is the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "1d23590b",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-03-19T16:21:29.852297Z",
     "iopub.status.busy": "2025-03-19T16:21:29.851957Z",
     "iopub.status.idle": "2025-03-19T16:21:29.855201Z",
     "shell.execute_reply": "2025-03-19T16:21:29.854635Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "assert out == ak.sum(array[\"y\"])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}