How to examine an array’s type#
The type of an Awkward Array can be determined using the ak.type() function, or ak.Array.type attribute of an array. It describes both the data-types of an array, e.g. float64, and the structure of the array (how many dimensions, which dimensions are ragged, which dimensions contain missing values, etc.).
Array types#
import awkward as ak
array = ak.Array(
    [
        ["Mr.", "Blue,", "you", "did", "it", "right"],
        ["But", "soon", "comes", "Mr.", "Night"],
        ["creepin'", "over"],
    ]
)
array.type.show()
3 * var * string
array.type.show() displays an extended subset of the Datashape language, which describes both shape and layout of an array in the form of units and dimensions. array.type actually returns an ak.types.Type object, which can be inspected
array.type
ArrayType(ListType(ListType(NumpyType('uint8', parameters={'__array__': 'char'}), parameters={'__array__': 'string'})), 3, None)
ak.Array.type always returns an ak.types.ArrayType object describing the outermost length of the array, which is always known.[1] The ak.types.ArrayType wraps a ak.types.Type object, which represents an array of “something”. For example, an array of integers:
ak.Array([1, 2, 3]).type
ArrayType(NumpyType('int64'), 3, None)
The outermost ak.types.ArrayType object indicates that this array has a known length of 3. Its content
ak.Array([1, 2, 3]).type.content
NumpyType('int64')
describes the array itself, which is an array of np.int64.
Regular vs ragged dimensions#
Regular arrays and ragged arrays have different types
import numpy as np
regular = ak.from_numpy(np.arange(8).reshape(2, 4))
ragged = ak.from_regular(regular)
regular.type.show()
ragged.type.show()
2 * 4 * int64
2 * var * int64
In the Datashape language, ragged dimensions are described as var, whilst regular (fixed) dimensions are expressed by an integer representing their size. At the type level, the ragged type object does not contain any size information, as it is no longer a constant part of the type:
regular.type.content.size
4
ragged.type.content.size
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[7], line 1
----> 1 ragged.type.content.size
AttributeError: 'ListType' object has no attribute 'size'
Records and tuples#
An Awkward Array with records is expressed using curly braces, resembling a JSON object or Python dictionary:
poet_records = ak.Array(
    [
        {"first": "William", "last": "Shakespeare"},
        {"first": "Sylvia", "last": "Plath"},
        {"first": "Homer", "last": "Simpson"},
    ]
)
poet_records.type.show()
3 * {
    first: string,
    last: string
}
whereas an array with tuples is expressed using parentheses, resembling a Python tuple:
poet_tuples = ak.Array(
    [
        ("William", "Shakespeare"),
        ("Sylvia", "Plath"),
        ("Homer", "Simpson"),
    ]
)
poet_tuples.type.show()
3 * (
    string,
    string
)
The ak.types.RecordType object contains information such as whether the record is a tuple, e.g.
poet_records.type.content.is_tuple
False
poet_tuples.type.content.is_tuple
True
Let’s look at the type of a simpler array:
ak.type([{"x": 1, "y": 2}, {"x": 3, "y": 4}])
ArrayType(RecordType([NumpyType('int64'), NumpyType('int64')], ['x', 'y']), 2, None)
Missing items#
Missing items are represented by both the option[...] and ? tokens, according to readability:
missing = ak.Array([33.0, None, 15.5, 99.1])
missing.type.show()
4 * ?float64
Awkward’s ak.types.OptionType object is used to represent this datashape type:
missing.type
ArrayType(OptionType(NumpyType('float64')), 4, None)
Unions#
A union is formed whenever multiple types are required for a particular dimension, e.g. if we concatenate two arrays with different records:
mixed = ak.concatenate(
    (
        [{"x": 1}],
        [{"y": 2}],
    )
)
mixed.type.show()
2 * union[
    {
        x: int64
    },
    {
        y: int64
    }
]
From the printed type, we can see that the formed union has two possible types. We can inspect these from the ak.types.UnionType object in mixed.type.content
mixed.type.content
UnionType([RecordType([NumpyType('int64')], ['x']), RecordType([NumpyType('int64')], ['y'])])
mixed.type.content.contents[0].show()
{
    x: int64
}
mixed.type.content.contents[1].show()
{
    y: int64
}
Strings#
Awkward Array implements strings as views over a 1D array of uint8 characters (char):
ak.type("hello world")
ArrayType(NumpyType('uint8', parameters={'__array__': 'char'}), 11, None)
This concept extends to an array of strings:
array = ak.Array(
    ["Mr.", "Blue,", "you", "did", "it", "right"]
)
array.type
ArrayType(ListType(NumpyType('uint8', parameters={'__array__': 'char'}), parameters={'__array__': 'string'}), 6, None)
array is a list of strings, which is represented as a list-of-list-of-char. When we evaluate str(array.type) (or directly print this value with array.type.show()), Awkward returns a readable type-string:
array.type.show()
6 * string
Scalar types#
In Array types it was discussed that all ak.type.Type objects are array-types, e.g. ak.types.NumpyType is the type of a NumPy (or CuPy, etc.) array of a fixed dtype:
import numpy as np
ak.type(np.arange(3))
ArrayType(NumpyType('int64'), 3, None)
Let’s now consider the following array of records:
record_array = ak.Array([
    {'x': 10, 'y': 11}
])
record_array.type
ArrayType(RecordType([NumpyType('int64'), NumpyType('int64')], ['x', 'y']), 1, None)
The resulting type object is an ak.types.ArrayType of ak.types.RecordType. This record-type represents an array of records, built from two NumPy arrays. From outside-to-inside, we can read the type object as:
- An array of length 1 
- that is an array of records with two fields ‘x’ and ‘y’ 
- which are both NumPy arrays of - np.int64type.
Now, what happens if we pull out a single record and inspect its type?
record = record_array[0]
record.type
ScalarType(RecordType([NumpyType('int64'), NumpyType('int64')], ['x', 'y']), None)
Unlike the ak.types.ArrayType objects returned by ak.type() for arrays, ak.Record.type always returns a ak.types.ScalarType object. Reading the returned type again from outside-to-inside, we have
- A scalar taken from an array 
- that is an array of records with two fields ‘x’ and ‘y’ 
- which are both NumPy arrays of - np.int64type.
Like ak.types.ArrayType, ak.types.ScalarType is an outermost type, but unlike ak.types.ArrayType it does more than add length information; it also removes a dimension from the final type!
 
    