How to list an array’s fields/columns/keys#

Hide code cell content
%config InteractiveShell.ast_node_interactivity = "last_expr_or_assign"

Arrays of records#

As seen in How to create arrays of records, one of Awkward Array’s most useful features is the ability to compose separate arrays into a single record structure:

import awkward as ak
import numpy as np

records = ak.Array(
    [
        {"x": 0.014309631995020777, "y": 0.7077380205549498},
        {"x": 0.44925764718311145, "y": 0.11927022136408238},
        {"x": 0.9870653236436898, "y": 0.1543661194285082},
        {"x": 0.7071893130949595, "y": 0.3966721033002645},
        {"x": 0.3059032831996634, "y": 0.5094743992919755},
    ]
)
[{x: 0.0143, y: 0.708},
 {x: 0.449, y: 0.119},
 {x: 0.987, y: 0.154},
 {x: 0.707, y: 0.397},
 {x: 0.306, y: 0.509}]
--------------------------------------------
backend: cpu
nbytes: 80 B
type: 5 * {
    x: float64,
    y: float64
}

The type of an array gives an indication of the fields that it contains. We can see that the records array contains two fields "x" and "y":

print(records.type)
5 * {x: float64, y: float64}
records.type.show()
5 * {
    x: float64,
    y: float64
}

The ak.Array object itself provides a convenient ak.Array.fields property that returns the list of field names

records.fields
['x', 'y']

In addition to this, Awkward Array also provides a high-level ak.fields() function that returns the same result

ak.fields(records)
['x', 'y']

Arrays of tuples#

In addition to records, Awkward Array also has the concept of tuples.

tuples = ak.Array(
    [
        (1, 2, 3),
        (1, 2, 3),
    ]
)
[(1, 2, 3),
 (1, 2, 3)]
---------------------------------------------
backend: cpu
nbytes: 48 B
type: 2 * (
    int64,
    int64,
    int64
)

These look very similar to records, but the fields are un-named:

print(tuples.type)
2 * (int64, int64, int64)

Despite this, the ak.fields() function, and ak.Array.fields property both return non-empty lists of strings when used to query a tuple array:

ak.fields(tuples)
['0', '1', '2']
tuples.fields
['0', '1', '2']

The returned field names are string-quoted integers ("0", "1", …) that refer to zero-indexed tuple slots, and can be used to project the array:

tuples["0"]
[1,
 1]
---------------
backend: cpu
nbytes: 16 B
type: 2 * int64
tuples["1"]
[2,
 2]
---------------
backend: cpu
nbytes: 16 B
type: 2 * int64

Whilst the fields of records can be accessed as attributes of the array:

records.x
[0.0143,
 0.449,
 0.987,
 0.707,
 0.306]
-----------------
backend: cpu
nbytes: 40 B
type: 5 * float64

The same is not true of tuples, because integers are not valid attribute names:

tuples.0
  Cell In[14], line 1
    tuples.0
          ^
SyntaxError: invalid syntax

The close similarity between records and tuples naturally raises the question:

How do I know whether an array contains records or tuples?

The ak.is_tuple() function can be used to differentiate between the two

ak.is_tuple(tuples)
True
ak.is_tuple(records)
False