---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.14.0
kernelspec:
  display_name: Python 3 (ipykernel)
  language: python
  name: python3
---

Generic buffers
===============

Most of the conversion functions target a particular library: NumPy, Arrow, Pandas, or Python itself. As a catch-all for other storage formats, Awkward Arrays can be converted to and from sets of named buffers. The buffers are not (usually) intelligible on their own; the length of the array and a JSON document are needed to reconstitute the original structure. This section will demonstrate how an array-set can be used to store an Awkward Array in an HDF5 file, which ordinarily wouldn't be able to represent nested, irregular data structures.

```{code-cell} ipython3
import awkward as ak
import numpy as np
import h5py
import json
```

From Awkward to buffers
-----------------------

Consider the following complex array:

```{code-cell} ipython3
ak_array = ak.Array(
    [
        [{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [1, 2]}, {"x": 3.3, "y": [1, 2, 3]}],
        [],
        [{"x": 4.4, "y": [1, 2, 3, 4]}, {"x": 5.5, "y": [1, 2, 3, 4, 5]}],
    ]
)
ak_array
```

The {func}`ak.to_buffers` function decomposes it into a set of one-dimensional arrays (a zero-copy operation).

```{code-cell} ipython3
form, length, container = ak.to_buffers(ak_array)
```

The pieces needed to reconstitute this array are:

   * the {class}`ak.forms.Form`, which defines how structure is built from one-dimensional arrays,
   * the length of the original array,
   * the one-dimensional arrays in the `container` (a {class}`collections.abc.MutableMapping`).

The {class}`ak.forms.Form` is like an Awkward {class}`ak.types.Type` in that it describes how the data are structured, but with more detail: it includes distinctions such as the difference between {class}`ak.contents.ListArray` and {class}`ak.contents.ListOffsetArray`, as well as the integer types of structural {class}`ak.index.Index`.

It is usually presented as JSON, and has a compact JSON format (when {meth}`ak.forms.Form.tojson` is invoked).

```{code-cell} ipython3
form
```

In this case, the `length` is just an integer. It would be a list of integers if `ak_array` was partitioned.

```{code-cell} ipython3
length
```

This `container` is a new dict, but it could have been a user-specified {class}`collections.abc.MutableMapping` if passed into {func}`ak.to_buffers` as an argument.

```{code-cell} ipython3
container
```

From buffers to Awkward
-----------------------

The function that reverses {func}`ak.to_buffers` is {func}`ak.from_buffers`. Its first three arguments are `form`, `length`, and `container`.

```{code-cell} ipython3
ak.from_buffers(form, length, container)
```

Minimizing the size of the output buffers
-----------------------------------------

The {func}`ak.to_buffers`/{func}`ak.from_buffers` functions exactly preserve an array, warts and all. Often, you'll want to only write {func}`ak.to_packed` arrays. "Packing" replaces an array structure with an equivalent structure that has no unreachable elements—data that you can't see as part of the array, and therefore probably don't want to write.

Here is an example of an array in need of packing:

```{code-cell} ipython3
unpacked = ak.Array(
    ak.contents.ListArray(
        ak.index.Index64(np.array([4, 10, 1])),
        ak.index.Index64(np.array([7, 10, 3])),
        ak.contents.NumpyArray(np.array([999, 4.4, 5.5, 999, 1.1, 2.2, 3.3, 999])),
    )
)
unpacked
```

This {class}`ak.contents.ListArray` is in a strange order and the `999` values are unreachable. (Also, using `starts[1] == stops[1] == 10` to represent an empty list is a little odd, though allowed by the specification.)

The {func}`ak.to_buffers` function dutifully writes the `999` values into the output, even though they're not visible in the array.

```{code-cell} ipython3
ak.to_buffers(unpacked)
```

If the intended purpose of calling {func}`ak.to_buffers` is to write to a file or send data over a network, this is wasted space. It can be trimmed by calling the {func}`ak.to_packed` function.

```{code-cell} ipython3
packed = ak.to_packed(unpacked)
packed
```

At high-level, the array appears to be the same, but its low-level structure is quite different:

```{code-cell} ipython3
unpacked.layout
```

```{code-cell} ipython3
packed.layout
```

This version of the array is more concise when written with {func}`ak.to_buffers`:

```{code-cell} ipython3
ak.to_buffers(packed)
```

Saving Awkward Arrays to HDF5
-----------------------------

The [h5py](https://www.h5py.org/) library presents each group in an HDF5 file as a {class}`collections.abc.MutableMapping`, which we can use as a container for an array-set. We must also save the `form` and `length` as metadata for the array to be retrievable.

```{code-cell} ipython3
file = h5py.File("/tmp/example.hdf5", "w")
group = file.create_group("awkward")
group
```

We can fill this `group` as a `container` by passing it in to {func}`ak.to_buffers`. (See the previous section for more on {func}`ak.to_packed`.)

```{code-cell} ipython3
form, length, container = ak.to_buffers(ak.to_packed(ak_array), container=group)
```

```{code-cell} ipython3
container
```

Now the HDF5 group has been filled with array pieces.

```{code-cell} ipython3
container.keys()
```

Here's one.

```{code-cell} ipython3
np.asarray(container["node0-offsets"])
```

Now we need to add the other information to the group as metadata. Since HDF5 accepts string-valued metadata, we can put it all in as JSON or numbers.

```{code-cell} ipython3
group.attrs["form"] = form.to_json()
group.attrs["form"]
```

```{code-cell} ipython3
group.attrs["length"] = length
group.attrs["length"]
```

Reading Awkward Arrays from HDF5
--------------------------------

With that, we can reconstitute the array by supplying {func}`ak.from_buffers` the right arguments from the group and metadata.

The group can't be used as a `container` as-is, since subscripting it returns {class}`h5py.Dataset` objects, rather than arrays.

```{code-cell} ipython3
reconstituted = ak.from_buffers(
    ak.forms.from_json(group.attrs["form"]),
    group.attrs["length"],
    {k: np.asarray(v) for k, v in group.items()},
)
reconstituted
```