How to convert to/from JSON#
Any JSON data can be converted to Awkward Arrays and any Awkward Arrays can be converted to JSON. Awkward type information, such as the distinction between fixed-size and variable-length lists, is lost in the transformation to JSON, however.
import awkward as ak
import pathlib
From JSON to Awkward#
The function for JSON → Awkward conversion is ak.from_json()
.
It can be given a JSON string:
ak.from_json("[[1.1, 2.2, 3.3], [], [4.4, 5.5]]")
[[1.1, 2.2, 3.3], [], [4.4, 5.5]] ----------------------- backend: cpu nbytes: 72 B type: 3 * var * float64
or a file name:
!echo "[[1.1, 2.2, 3.3], [], [4.4, 5.5]]" > /tmp/awkward-example-1.json
ak.from_json(pathlib.Path("/tmp/awkward-example-1.json"))
[[1.1, 2.2, 3.3], [], [4.4, 5.5]] ----------------------- backend: cpu nbytes: 72 B type: 3 * var * float64
If the dataset contains a single JSON object, an ak.Record
is returned, rather than an ak.Array
.
ak.from_json('{"x": 1, "y": [1, 2], "z": "hello"}')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/IPython/core/formatters.py:1036, in MimeBundleFormatter.__call__(self, obj, include, exclude)
1033 method = get_real_method(obj, self.print_method)
1035 if method is not None:
-> 1036 return method(include=include, exclude=exclude)
1037 return None
1038 else:
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/highlevel.py:2415, in Record._repr_mimebundle_(self, include, exclude)
2409 def _repr_mimebundle_(self, include=None, exclude=None):
2410 # order:
2411 # first: array,
2412 # last: type,
2413 # middle: rest sorted by length of prefix (longest first)
-> 2415 rows = highlevel_array_show_rows(
2416 array=self,
2417 type=True,
2418 named_axis=True,
2419 nbytes=True,
2420 backend=True,
2421 )
2422 header_lines = rows.pop(0).removesuffix("\n").splitlines()
2424 # it's always the second row (after the array)
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/prettyprint.py:490, in highlevel_array_show_rows(array, limit_rows, limit_cols, type, named_axis, nbytes, backend, formatter, precision)
488 rows.append(named_axis_line)
489 if nbytes:
--> 490 nbytes_line = f"nbytes: {bytes_repr(array.nbytes)}"
491 rows.append(nbytes_line)
492 if backend:
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/highlevel.py:2223, in Record.__getattr__(self, where)
2195 """
2196 Whenever possible, fields can be accessed as attributes.
2197
(...)
2220 keyword.
2221 """
2222 if hasattr(type(self), where):
-> 2223 return super().__getattribute__(where)
2224 else:
2225 if where in self._layout.fields:
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/highlevel.py:2035, in Record.nbytes(self)
2024 @property
2025 def nbytes(self):
2026 """
2027 The total number of bytes in all the #ak.index.Index,
2028 and #ak.contents.NumpyArray buffers in this array tree.
(...)
2033 array buffers.
2034 """
-> 2035 return self._layout.nbytes
AttributeError: 'Record' object has no attribute 'nbytes'
<Record {x: 1, y: [1, 2], z: 'hello'} type='{x: int64, y: var * int64, z: s...'>
From Awkward to JSON#
The function for Awkward → JSON conversion is ak.to_json()
.
With one argument, it returns a string.
ak.to_json(ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]]))
'[[1.1,2.2,3.3],[],[4.4,5.5]]'
But if a destination
is given, it is taken to be a filename for output.
ak.to_json(ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]]), "/tmp/awkward-example-2.json")
!cat /tmp/awkward-example-2.json
[[1.1,2.2,3.3],[],[4.4,5.5]]
Conversion of different types#
All of the rules that apply for Python objects in ak.from_iter()
and ak.to_list()
apply to ak.from_json()
and ak.to_json()
, replacing builtin Python types for JSON types. (One exception: JSON has no equivalent of a Python tuple.)
Performance#
Since Awkward Array internally uses RapidJSON to simultaneously parse and convert the JSON string, ak.from_json()
and ak.to_json()
should always be faster and use less memory than ak.from_iter()
and ak.to_list()
. Don’t convert JSON strings into or out of Python objects for the sake of converting them as Python objects: use the JSON converters directly.