ak.run_lengths#

Defined in awkward.operations.ak_run_lengths on line 17.

ak.run_lengths(array, *, highlevel=True, behavior=None, attrs=None)#
Parameters:
  • array – Array-like data (anything ak.to_layout recognizes).

  • highlevel (bool) – If True, return an ak.Array; otherwise, return a low-level ak.contents.Content subclass.

  • behavior (None or dict) – Custom ak.behavior for the output array, if high-level.

  • attrs (None or dict) – Custom attributes for the output array, if high-level.

Computes the lengths of sequences of identical values at the deepest level of nesting, returning an array with the same structure but with int64 type.

For example,

>>> array = ak.Array([1.1, 1.1, 1.1, 2.2, 3.3, 3.3, 4.4, 4.4, 5.5])
>>> ak.run_lengths(array)
<Array [3, 1, 2, 2, 1] type='5 * int64'>

There are 3 instances of 1.1, followed by 1 instance of 2.2, 2 instances of 3.3, 2 instances of 4.4, and 1 instance of 5.5.

The order and uniqueness of the input data doesn’t matter,

>>> array = ak.Array([1.1, 1.1, 1.1, 5.5, 4.4, 4.4, 1.1, 1.1, 5.5])
>>> ak.run_lengths(array)
<Array [3, 1, 2, 2, 1] type='5 * int64'>

just the difference between each value and its neighbors.

The data can be nested, but runs don’t cross list boundaries.

>>> array = ak.Array([[1.1, 1.1, 1.1, 2.2, 3.3], [3.3, 4.4], [4.4, 5.5]])
>>> ak.run_lengths(array)
<Array [[3, 1, 1], [1, 1], [1, 1]] type='3 * var * int64'>

This function recognizes strings as distinguishable values.

>>> array = ak.Array([["one", "one"], ["one", "two", "two"], ["three", "two", "two"]])
>>> ak.run_lengths(array)
<Array [[2], [1, 2], [1, 2]] type='3 * var * int64'>

Note that this can be combined with ak.argsort and ak.unflatten to compute a “group by” operation:

>>> array = ak.Array([{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 1, "y": 1.1},
...                   {"x": 3, "y": 3.3}, {"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}])
>>> sorted = array[ak.argsort(array.x)]
>>> sorted.x
<Array [1, 1, 1, 2, 2, 3] type='6 * int64'>
>>> ak.run_lengths(sorted.x)
<Array [3, 2, 1] type='3 * int64'>
>>> ak.unflatten(sorted, ak.run_lengths(sorted.x)).show()
[[{x: 1, y: 1.1}, {x: 1, y: 1.1}, {x: 1, y: 1.1}],
 [{x: 2, y: 2.2}, {x: 2, y: 2.2}],
 [{x: 3, y: 3.3}]]

Unlike a database “group by,” this operation can be applied in bulk to many sublists (though the run lengths need to be fully flattened to be used as counts for ak.unflatten, and you need to specify axis=-1 as the depth).

>>> array = ak.Array([[{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 1, "y": 1.1}],
...                   [{"x": 3, "y": 3.3}, {"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}]])
>>> sorted = array[ak.argsort(array.x)]
>>> sorted.x
<Array [[1, 1, 2], [1, 2, 3]] type='2 * var * int64'>
>>> ak.run_lengths(sorted.x)
<Array [[2, 1], [1, 1, 1]] type='2 * var * int64'>
>>> counts = ak.flatten(ak.run_lengths(sorted.x), axis=None)
>>> ak.unflatten(sorted, counts, axis=-1).show()
[[[{x: 1, y: 1.1}, {x: 1, y: 1.1}], [{x: 2, y: 2.2}]],
 [[{x: 1, y: 1.1}], [{x: 2, y: 2.2}], [{x: 3, y: 3.3}]]]

See also ak.num, ak.argsort, ak.unflatten.