{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "558f5f22",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "# Jagged, Ragged, Awkward Arrays!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f76191e",
   "metadata": {},
   "source": [
    "_Originally presented as [part](https://hsf-training.github.io/hsf-training-scikit-hep-webpage/04-awkward/index.html) of [HSF Scikit-HEP training on March 28, 2022](https://indico.cern.ch/event/1112526/)._"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8685b555",
   "metadata": {},
   "source": [
    "<br><br><br>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8e3073f",
   "metadata": {},
   "source": [
    "NumPy can't represent an array of variable-length lists without resorting to arrays of objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fa4b4479",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "raises-exception"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "# generates a ValueError\n",
    "np.array([[0.0, 1.1, 2.2], [], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8513978",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Awkward Array is intended to fill this gap:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c8bceb0e",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import awkward as ak\n",
    "\n",
    "ak.Array([[0.0, 1.1, 2.2], [], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2bc2788",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Arrays like this are sometimes called \"[jagged arrays](https://en.wikipedia.org/wiki/Jagged_array)\" and sometimes \"ragged arrays.\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07303759",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Slicing in Awkward Array"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d3e1679",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Basic slices are a generalization of NumPy's—what NumPy would do if it had variable-length lists."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "350fa738",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "array = ak.Array([[0.0, 1.1, 2.2], [], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]])\n",
    "array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "efcbf09b",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "array[2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7c9d32f6",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "array[-1, 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "65e1b348",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "array[2:, 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2adb679b",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "array[2:, 1:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cdeea244",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "raises-exception"
    ]
   },
   "outputs": [],
   "source": [
    "array[:, 0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9c5aa3a",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "**Quick quiz:** why does the last one raise an error?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "543260f6",
   "metadata": {},
   "source": [
    "Boolean and integer slices work, too:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54783b4f",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "array[[True, False, True, False, True]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7799c181",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "array[[2, 3, 3, 1]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ba5a2da",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Like NumPy, boolean arrays for slices can be computed, and functions like [ak.num](https://awkward-array.readthedocs.io/en/latest/_auto/ak.num.html) are helpful for that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "528e98cf",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "ak.num(array)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68dfaa13",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "ak.num(array) > 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a84dc910",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "array[ak.num(array) > 0, 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7749a21b",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "array[ak.num(array) > 1, 1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8c43b79",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Now consider this (similar to an example from the first lesson):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a781f45b",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "cut = array * 10 % 2 == 0\n",
    "cut"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2bd0fa71",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "array[cut]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3fc32773",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "This array, `cut`, is not just an array of booleans. It's a jagged array of booleans. All of its nested lists fit into `array`'s nested lists, so it can deeply select numbers, rather than selecting lists."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "347891c0",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Application: selecting particles, rather than events"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e515351",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Returning to the big TTree from the previous lesson,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10d7130c",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import uproot\n",
    "\n",
    "file = uproot.open(\n",
    "    \"https://github.com/jpivarski-talks/2023-12-18-hsf-india-tutorial-bhubaneswar/raw/main/data/SMHiggsToZZTo4L.root\"\n",
    ")\n",
    "tree = file[\"Events\"]\n",
    "\n",
    "muon_pt = tree[\"Muon_pt\"].array(entry_stop=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20cf355e",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "This jagged array of booleans selects all *muons* with at least 20 GeV:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b24110bd",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "particle_cut = muon_pt > 20"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5811556c",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "muon_pt[particle_cut]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "746a9944",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "and this non-jagged array of booleans (made with [ak.any](https://awkward-array.readthedocs.io/en/latest/_auto/ak.any.html)) selects all events *that have* a muon with at least 20 GeV:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "176255e3",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "event_cut = ak.any(muon_pt > 20, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8589d804",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "muon_pt[event_cut]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb0e32ee",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "**Quick quiz:** construct exactly the same `event_cut` using [ak.max](https://awkward-array.readthedocs.io/en/latest/_auto/ak.max.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f3472fa",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "**Quick quiz:** apply both cuts; that is, select muons with over 20 GeV from events that have them."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e068fb31",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Hint: you'll want to make a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2af1c7bb",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "cleaned = muon_pt[particle_cut]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec7ce43b",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "intermediary and you can't use the variable `event_cut`, as-is."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e178dad",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "**Hint:** the final result should be a jagged array, just like muon_pt, but with fewer lists and fewer items in those lists."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c896b99",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Combinatorics in Awkward Array"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ecc3524",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Variable-length lists present more problems than just slicing and computing formulas array-at-a-time. Often, we want to combine particles in all possible pairs (within each event) to look for decay chains."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2914e2a",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "### Pairs from two arrays, pairs from a single array"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ec24b29",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Awkward Array has functions that generate these combinations. For instance, [ak.cartesian](https://awkward-array.readthedocs.io/en/latest/_auto/ak.cartesian.html) takes a Cartesian product per event (when `axis=1`, the default).\n",
    "\n",
    "![](cartoon-cartesian.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87277899",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "numbers = ak.Array([[1, 2, 3], [], [5, 7], [11]])\n",
    "letters = ak.Array([[\"a\", \"b\"], [\"c\"], [\"d\"], [\"e\", \"f\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "03a7d18d",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "pairs = ak.cartesian((numbers, letters))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2cd52754",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "These `pairs` are 2-tuples, which are like records in how they're sliced out of an array: using strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09e285e5",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "pairs[\"0\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f8571883",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "pairs[\"1\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c06781d6",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "There's also [ak.unzip](https://awkward-array.readthedocs.io/en/latest/_auto/ak.unzip.html), which extracts every field into a separate array (opposite of [ak.zip](https://awkward-array.readthedocs.io/en/latest/_auto/ak.zip.html))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a7453f65",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "lefts, rights = ak.unzip(pairs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0326b2aa",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "lefts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c119522a",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "rights"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d27a1313",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Note that these `lefts` and `rights` are not the original `numbers` and `letters`: they have been duplicated and have the same shape.\n",
    "\n",
    "The Cartesian product is equivalent to this C++ `for` loop over two collections:\n",
    "\n",
    "```cpp\n",
    "for (int i = 0; i < numbers.size(); i++) {\n",
    "  for (int j = 0; j < letters.size(); j++) {\n",
    "    // compute formula with numbers[i] and letters[j]\n",
    "  }\n",
    "}\n",
    "```\n",
    "\n",
    "Sometimes, though, we want to find all pairs within a single collection, without repetition. That would be equivalent to this C++ `for` loop:\n",
    "\n",
    "```cpp\n",
    "for (int i = 0; i < numbers.size(); i++) {\n",
    "  for (int j = i + 1; i < numbers.size(); j++) {\n",
    "    // compute formula with numbers[i] and numbers[j]\n",
    "  }\n",
    "}\n",
    "```\n",
    "\n",
    "The Awkward function for this case is [ak.combinations](https://awkward-array.readthedocs.io/en/latest/_auto/ak.combinations.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad2ba0fd",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "![cartoon-combinations](cartoon-combinations.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "637cc498",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "pairs = ak.combinations(numbers, 2)\n",
    "pairs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d6949c0a",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "lefts, rights = ak.unzip(pairs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a204ffd5",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "lefts * rights  # they line up, so we can compute formulas"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f891b767",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Application to dimuons"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "536244e3",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "The dimuon search in the previous lesson was a little naive in that we required *exactly two* muons to exist in every event and only computed the mass of that combination. If a third muon were present because it's a complex electroweak decay or because something was mismeasured, we would be blind to the other two muons. They might be real dimuons.\n",
    "\n",
    "A better procedure would be to look for all pairs of muons in an event and apply some criteria for selecting them.\n",
    "\n",
    "In this example, we'll [ak.zip](https://awkward-array.readthedocs.io/en/latest/_auto/ak.zip.html) the muon variables together into records."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b2319e9f",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import uproot\n",
    "import awkward as ak\n",
    "\n",
    "file = uproot.open(\n",
    "    \"https://github.com/jpivarski-talks/2023-12-18-hsf-india-tutorial-bhubaneswar/raw/main/data/SMHiggsToZZTo4L.root\"\n",
    ")\n",
    "tree = file[\"Events\"]\n",
    "\n",
    "arrays = tree.arrays(filter_name=\"/Muon_(pt|eta|phi|charge)/\", entry_stop=10000)\n",
    "\n",
    "muons = ak.zip(\n",
    "    {\n",
    "        \"pt\": arrays[\"Muon_pt\"],\n",
    "        \"eta\": arrays[\"Muon_eta\"],\n",
    "        \"phi\": arrays[\"Muon_phi\"],\n",
    "        \"charge\": arrays[\"Muon_charge\"],\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fab27117",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "arrays.type.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e22e64e8",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "muons.type.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9073550",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "The difference between `arrays` and `muons` is that `arrays` contains separate lists of `\"Muon_pt\"`, `\"Muon_eta\"`, `\"Muon_phi\"`, `\"Muon_charge\"`, while `muons` contains lists of records with `\"pt\"`, `\"eta\"`, `\"phi\"`, `\"charge\"` fields."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5d22c0c",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Now we can compute pairs of muon *objects*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "405cd5e4",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "pairs = ak.combinations(muons, 2)\n",
    "pairs.type.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d35e896",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "and separate them into arrays of the first muon and the second muon in each pair."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a696fcf",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "mu1, mu2 = ak.unzip(pairs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "47338d3e",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "**Quick quiz:** how would you ensure that all lists of records in `mu1` and `mu2` have the same lengths? Hint: see [ak.num](https://awkward-array.readthedocs.io/en/latest/_auto/ak.num.html) and [ak.all](https://awkward-array.readthedocs.io/en/latest/_auto/ak.all.html).\n",
    "\n",
    "Since they do have the same lengths, we can use them in a formula."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "73af7a52",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "mass = np.sqrt(\n",
    "    2 * mu1.pt * mu2.pt * (np.cosh(mu1.eta - mu2.eta) - np.cos(mu1.phi - mu2.phi))\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17af8b2f",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "**Quick quiz:** how many masses do we have in each event? How does this compare with `muons`, `mu1`, and `mu2`?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "889c348d",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Plotting the jagged array"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f353e537",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Since this `mass` is a jagged array, it can't be directly histogrammed. Histograms take a set of *numbers* as inputs, but this array contains *lists*.\n",
    "\n",
    "Supposing you just want to plot the numbers from the lists, you can use [ak.flatten](https://awkward-array.readthedocs.io/en/latest/_auto/ak.flatten.html) to flatten one level of list or [ak.ravel](https://awkward-array.readthedocs.io/en/latest/_auto/ak.ravel.html) to flatten all levels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d5a25b9e",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import hist\n",
    "\n",
    "hist.Hist(hist.axis.Regular(120, 0, 120, label=\"mass [GeV]\")).fill(\n",
    "    ak.ravel(mass)\n",
    ").plot()\n",
    "\n",
    "None"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9455a9c",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Alternatively, suppose you want to plot the *maximum* mass-candidate in each event, biasing it toward Z bosons? [ak.max](https://awkward-array.readthedocs.io/en/latest/_auto/ak.max.html) is a different function that picks one element from each list, when used with `axis=1`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "02c82332",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "ak.max(mass, axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5837499",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Some values are `None` because there is no maximum of an empty list. [ak.flatten](https://awkward-array.readthedocs.io/en/latest/_auto/ak.flatten.html)/[ak.ravel](https://awkward-array.readthedocs.io/en/latest/_auto/ak.ravel.html) remove missing values (`None`) as well as squashing lists,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "648d3ee0",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "ak.flatten(ak.max(mass, axis=1), axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b17abd2",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "but so does removing the empty lists in the first place."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f1048884",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "ak.max(mass[ak.num(mass) > 0], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1b29eee",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Exercise: select pairs of muons with opposite charges"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85a10733",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "This is neither an event-level cut nor a particle-level cut, it is a cut on particle *pairs*."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c1d770f",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "### Solution\n",
    "\n",
    "The `mu1` and `mu2` variables are the left and right halves of muon pairs. Therefore,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "faddaa48",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "cut = (mu1.charge != mu2.charge)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91b6506e",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "has the right multiplicity to be applied to the `mass` array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e8ab66f",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "hist.Hist(hist.axis.Regular(120, 0, 120, label=\"mass [GeV]\")).fill(\n",
    "\n",
    "    ak.ravel(mass[cut])\n",
    "\n",
    ").plot()\n",
    "\n",
    "None"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7ae0d9e-77a0-42a7-996c-8209dc21e493",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "plots the cleaned muon pairs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a780f85-c53c-4322-afe7-f53aa4e08bfd",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Exercise (harder): plot the one mass candidate per event that is strictly closest to the Z mass"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24b016ea-bfe8-456c-99b7-b51d060fc8ac",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Instead of just taking the maximum mass in each event, find the one with the minimum difference between computed mass and `zmass = 91`.\n",
    "\n",
    "**Hint:** use [ak.argmin](https://awkward-array.readthedocs.io/en/latest/_auto/ak.argmin.html) with `keepdims=True`.\n",
    "\n",
    "Anticipating one of the future lessons, you could get a more accurate mass by asking the Particle library:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42029417",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import particle, hepunits\n",
    "\n",
    "zmass = particle.Particle.findall(\"Z0\")[0].mass / hepunits.GeV"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f36f265",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "### Solution\n",
    "\n",
    "Instead of maximizing `mass`, we want to minimize `abs(mass - zmass)` and apply that choice to `mass`. [ak.argmin](https://awkward-array.readthedocs.io/en/latest/_auto/ak.argmin.html) returns the *index position* of this minimum difference, which we can then apply to the original `mass`. However, without `keepdims=True`, [ak.argmin](https://awkward-array.readthedocs.io/en/latest/_auto/ak.argmin.html) removes the dimension we would need for this array to have the same nested shape as `mass`. Therefore, we `keepdims=True` and then use [ak.ravel](https://awkward-array.readthedocs.io/en/latest/_auto/ak.ravel.html) to get rid of missing values and flatten lists.\n",
    "\n",
    "The last step would require two applications of [ak.flatten](https://awkward-array.readthedocs.io/en/latest/_auto/ak.flatten.html): one for squashing lists at the first level and another for removing `None` at the second level."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9865f55e",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "which = ak.argmin(abs(mass - zmass), axis=1, keepdims=True)\n",
    "\n",
    "hist.Hist(hist.axis.Regular(120, 0, 120, label=\"mass [GeV]\")).fill(\n",
    "\n",
    "    ak.flatten(mass[which], axis=None)\n",
    "\n",
    ").plot()\n",
    "\n",
    "None"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all",
   "notebook_metadata_filter": "-all"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}