ak.to_buffers#
Defined in awkward.operations.ak_to_buffers on line 16.
- ak.to_buffers(array, container=None, buffer_key='{form_key}-{attribute}', form_key='node{id}', *, id_start=0, backend=None, byteorder=ak._util.native_byteorder)#
- Parameters:
array – Array-like data (anything
ak.to_layoutrecognizes).container (None or MutableMapping) – The str → NumPy arrays (or Python buffers) that represent the decomposed Awkward Array. This
containeris only assumed to have a__setitem__method that accepts strings as keys.buffer_key (str or callable) – Python format string containing
"{form_key}"and/or"{attribute}"or a function that takes these (and/orlayout) as keyword arguments and returns a string to use as a key for a buffer in thecontainer. Theform_keyis the result of applyingform_key(below), and theattributeis a hard-coded string representing the buffer’s function (e.g."data","offsets","index").form_key (str, callable) – Python format string containing
"{id}"or a function that takes this (and/orlayout) as a keyword argument and returns a string to use as a key for a Form node. Together, thebuffer_keyandform_keylinks attributes of each Form node to data in thecontainer.id_start (int) – Starting
idto use inform_keyand hencebuffer_key. This integer increases in a depth-first walk over thearraynodes and can be used to generate unique keys for each Form.backend (
"cpu","cuda","jax", None) – Backend to use to generate values that are put into thecontainer. The default,"cpu", makes NumPy arrays, which are in main memory (e.g. not GPU) and satisfy Python’s Buffer protocol. If all the buffers inarrayhave the samebackendas this, they won’t be copied. If the backend is None, then the backend of the layout will be used to generate the buffers.byteorder (
"<",">") – Endianness of buffers written tocontainer. If the byteorder does not match the current system byteorder, the arrays will be copied.
Decomposes an Awkward Array into a Form and a collection of memory buffers, so that data can be losslessly written to file formats and storage devices that only map names to binary blobs (such as a filesystem directory).
This function returns a 3-tuple:
(form, length, container)
where the form is a ak.forms.Form (whose string representation is JSON),
the length is an integer (len(array)), and the container is either
the MutableMapping you passed in or a new dict containing the buffers (as
NumPy arrays).
These are also the first three arguments of ak.from_buffers, so a full
round-trip is
>>> reconstituted = ak.from_buffers(*ak.to_buffers(original))
The container argument lets you specify your own MutableMapping, which
might be an interface to some storage format or device (e.g. h5py). It’s
okay if the container drops NumPy’s dtype and shape information,
leaving raw bytes, since dtype and shape can be reconstituted from
the ak.forms.NumpyForm.
The buffer_key and form_key arguments let you configure the names of the
buffers added to the container and string labels on each Form node, so that
the two can be uniquely matched later. buffer_key and form_key are distinct
arguments to allow for more indirection (buffer keys can differ from Form keys,
as long as there’s a way to map them to each other) and because some Form nodes,
such as ak.forms.ListForm and ak.forms.UnionForm, have more than one attribute
(starts and stops for ak.forms.ListForm and tags and index for
ak.forms.UnionForm).
Awkward 1.x also included partition numbers ("part0-", "part1-", …) in
the buffer keys. In version 2.x onward, partitioning is handled externally by
Dask, but partition numbers can be emulated by prepending a fixed "partN-"
string to the buffer_key. The array represents exactly one partition.
Here is a simple example:
>>> original = ak.Array([[1, 2, 3], [], [4, 5]])
>>> form, length, container = ak.to_buffers(original)
>>> print(form)
{
"class": "ListOffsetArray",
"offsets": "i64",
"content": {
"class": "NumpyArray",
"primitive": "int64",
"form_key": "node1"
},
"form_key": "node0"
}
>>> length
3
>>> container
{'node0-offsets': array([0, 3, 3, 5]), 'node1-data': array([1, 2, 3, 4, 5])}
which may be read back with
>>> ak.from_buffers(form, length, container)
<Array [[1, 2, 3], [], [4, 5]] type='3 * var * int64'>
If you intend to use this function for saving data, you may want to pack it
first with ak.to_packed.
See also ak.from_buffers and ak.to_packed.