---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.16.1
kernelspec:
  display_name: Python 3 (ipykernel)
  language: python
  name: python3
---

# What is an "Awkward" Array?


```{code-cell} ipython3
import numpy as np
import awkward as ak
```

## Versatile Arrays
Awkward Arrays are general tree-like data structures, like JSON, but contiguous in memory and operated upon with compiled, vectorized code like NumPy.

They look like NumPy arrays:


```{code-cell} ipython3
ak.Array([1, 2, 3])
```

Like NumPy, they can have multiple dimensions:


```{code-cell} ipython3
ak.Array([
    [1, 2, 3],
    [4, 5, 6]
])
```

These dimensions can have varying lengths; arrays can be [ragged](https://en.wikipedia.org/wiki/Jagged_array):


```{code-cell} ipython3
ak.Array([
    [1, 2, 3],
    [4],
    [5, 6]
])
```

Each dimension can contain missing values:


```{code-cell} ipython3
ak.Array([
    [1, 2, 3],
    [4],
    [5, 6, None]
])
```

Awkward Arrays can store _numbers_:


```{code-cell} ipython3
ak.Array([
    [3, 141], 
    [59, 26, 535], 
    [8]
])
```

They can also work with _dates_:


```{code-cell} ipython3
ak.Array(
    [
        [np.datetime64("1815-12-10"), np.datetime64("1969-07-16")],
        [np.datetime64("1564-04-26")],
    ]
)
```

They can even work with _strings_:


```{code-cell} ipython3
ak.Array(
    [
        [
            "Benjamin List",
            "David MacMillan",
        ],
        [
            "Emmanuelle Charpentier",
            "Jennifer A. Doudna",
        ],
    ]
)
```

Awkward Arrays can have structure through _records_:


```{code-cell} ipython3
ak.Array(
    [
        [
            {"name": "Benjamin List", "age": 53},
            {"name": "David MacMillan", "age": 53},
        ],
        [
            {"name": "Emmanuelle Charpentier", "age": 52},
            {"name": "Jennifer A. Doudna", "age": 57},
        ],
        [
            {"name": "Akira Yoshino", "age": 73},
            {"name": "M. Stanley Whittingham", "age": 79},
            {"name": "John B. Goodenough", "age": 98},
        ],
    ]
)
```

In fact, Awkward Arrays can represent many kinds of jagged data. They can possess complex structures that mix records, and primitive types.


```{code-cell} ipython3
ak.Array(
    [
        [
            {
                "name": "Benjamin List",
                "age": 53,
                "institutions": [
                    "University of Cologne",
                    "Max Planck Institute for Coal Research",
                    "Hokkaido University",
                ],
            },
            {
                "name": "David MacMillan",
                "age": 53,
                "institutions": None,
            },
        ]
    ]
)
```

They can even contain unions!


```{code-cell} ipython3
ak.Array(
    [
        [np.datetime64("1815-12-10"), "Cassini"],
        [np.datetime64("1564-04-26")],
    ]
)
```

## NumPy-like interface

Awkward Array _looks like_ NumPy. It behaves identically to NumPy for regular arrays


```{code-cell} ipython3
x = ak.Array([
    [1, 2, 3],
    [4, 5, 6]
]);
```


```{code-cell} ipython3
ak.sum(x, axis=-1)
```

providing a similar high-level API, and implementing the [ufunc](https://numpy.org/doc/stable/reference/ufuncs.html) mechanism:


```{code-cell} ipython3
powers_of_two = ak.Array(
    [
        [1, 2, 4],
        [None, 8],
        [16],
    ]
);
```


```{code-cell} ipython3
ak.sum(powers_of_two)
```

But generalises to the tricky kinds of data that NumPy struggles to work with. It can perform reductions through varying length lists:

![](example-reduction-sum.svg)


```{code-cell} ipython3
ak.sum(powers_of_two, axis=0)
```

## Lightweight structures
Awkward makes it east to pull apart record structures:


```{code-cell} ipython3
nobel_prize_winner = ak.Array(
    [
        [
            {"name": "Benjamin List", "age": 53},
            {"name": "David MacMillan", "age": 53},
        ],
        [
            {"name": "Emmanuelle Charpentier", "age": 52},
            {"name": "Jennifer A. Doudna", "age": 57},
        ],
        [
            {"name": "Akira Yoshino", "age": 73},
            {"name": "M. Stanley Whittingham", "age": 79},
            {"name": "John B. Goodenough", "age": 98},
        ],
    ]
);
```


```{code-cell} ipython3
nobel_prize_winner.name
```


```{code-cell} ipython3
nobel_prize_winner.age
```

These records are lightweight, and simple to compose:


```{code-cell} ipython3
nobel_prize_winner_with_birth_year = ak.zip({
    "name": nobel_prize_winner.name,
    "age": nobel_prize_winner.age,
    "birth_year": 2021 - nobel_prize_winner.age
});
```


```{code-cell} ipython3
nobel_prize_winner_with_birth_year.show()
```

## High performance
Like NumPy, Awkward Array performs computations in fast, optimised kernels.


```{code-cell} ipython3
large_array = ak.Array([[1, 2, 3], [], [4, 5]] * 1_000_000)
```

We can compute the sum in `3.37 ms ± 107 µs` on a reference CPU:


```{code-cell} ipython3
ak.sum(large_array)
```

The same sum can be computed with pure-Python over the flattened array in `369 ms ± 8.07 ms`:


```{code-cell} ipython3
large_flat_array = ak.ravel(large_array)

sum(large_flat_array)
```

These performance values are not benchmarks; they are only an indication of the speed of Awkward Array.

Some problems are hard to solve with array-oriented programming. Awkward Array supports [Numba](https://numba.pydata.org/) out of the box:

```{code-cell} ipython3
import numba as nb

@nb.njit
def cumulative_sum(arr):
    result = 0
    for x in arr:
        for y in x:
            result += y
    return result
    
cumulative_sum(large_array)
```