ak.run_lengths#
Defined in awkward.operations.ak_run_lengths on line 17.
- ak.run_lengths(array, *, highlevel=True, behavior=None, attrs=None)#
- Parameters:
array – Array-like data (anything
ak.to_layout
recognizes).highlevel (bool) – If True, return an
ak.Array
; otherwise, return a low-levelak.contents.Content
subclass.behavior (None or dict) – Custom
ak.behavior
for the output array, if high-level.attrs (None or dict) – Custom attributes for the output array, if high-level.
Computes the lengths of sequences of identical values at the deepest level
of nesting, returning an array with the same structure but with int64
type.
For example,
>>> array = ak.Array([1.1, 1.1, 1.1, 2.2, 3.3, 3.3, 4.4, 4.4, 5.5])
>>> ak.run_lengths(array)
<Array [3, 1, 2, 2, 1] type='5 * int64'>
There are 3 instances of 1.1, followed by 1 instance of 2.2, 2 instances of 3.3, 2 instances of 4.4, and 1 instance of 5.5.
The order and uniqueness of the input data doesn’t matter,
>>> array = ak.Array([1.1, 1.1, 1.1, 5.5, 4.4, 4.4, 1.1, 1.1, 5.5])
>>> ak.run_lengths(array)
<Array [3, 1, 2, 2, 1] type='5 * int64'>
just the difference between each value and its neighbors.
The data can be nested, but runs don’t cross list boundaries.
>>> array = ak.Array([[1.1, 1.1, 1.1, 2.2, 3.3], [3.3, 4.4], [4.4, 5.5]])
>>> ak.run_lengths(array)
<Array [[3, 1, 1], [1, 1], [1, 1]] type='3 * var * int64'>
This function recognizes strings as distinguishable values.
>>> array = ak.Array([["one", "one"], ["one", "two", "two"], ["three", "two", "two"]])
>>> ak.run_lengths(array)
<Array [[2], [1, 2], [1, 2]] type='3 * var * int64'>
Note that this can be combined with ak.argsort
and ak.unflatten
to compute
a “group by” operation:
>>> array = ak.Array([{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 1, "y": 1.1},
... {"x": 3, "y": 3.3}, {"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}])
>>> sorted = array[ak.argsort(array.x)]
>>> sorted.x
<Array [1, 1, 1, 2, 2, 3] type='6 * int64'>
>>> ak.run_lengths(sorted.x)
<Array [3, 2, 1] type='3 * int64'>
>>> ak.unflatten(sorted, ak.run_lengths(sorted.x)).show()
[[{x: 1, y: 1.1}, {x: 1, y: 1.1}, {x: 1, y: 1.1}],
[{x: 2, y: 2.2}, {x: 2, y: 2.2}],
[{x: 3, y: 3.3}]]
Unlike a database “group by,” this operation can be applied in bulk to many sublists
(though the run lengths need to be fully flattened to be used as counts
for
ak.unflatten
, and you need to specify axis=-1
as the depth).
>>> array = ak.Array([[{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 1, "y": 1.1}],
... [{"x": 3, "y": 3.3}, {"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}]])
>>> sorted = array[ak.argsort(array.x)]
>>> sorted.x
<Array [[1, 1, 2], [1, 2, 3]] type='2 * var * int64'>
>>> ak.run_lengths(sorted.x)
<Array [[2, 1], [1, 1, 1]] type='2 * var * int64'>
>>> counts = ak.flatten(ak.run_lengths(sorted.x), axis=None)
>>> ak.unflatten(sorted, counts, axis=-1).show()
[[[{x: 1, y: 1.1}, {x: 1, y: 1.1}], [{x: 2, y: 2.2}]],
[[{x: 1, y: 1.1}], [{x: 2, y: 2.2}], [{x: 3, y: 3.3}]]]
See also ak.num
, ak.argsort
, ak.unflatten
.