ak.transform
------------

.. py:module: ak.transform

Defined in `awkward.operations.ak_transform <https://github.com/scikit-hep/awkward/blob/36da52cfa8846355c390beb6555eac1d31c27c26/src/awkward/operations/ak_transform.py>`__ on `line 27 <https://github.com/scikit-hep/awkward/blob/36da52cfa8846355c390beb6555eac1d31c27c26/src/awkward/operations/ak_transform.py#L27>`__.

.. py:function:: ak.transform(transformation, array, *more_arrays, depth_context=None, lateral_context=None, allow_records=True, broadcast_parameters_rule='intersect', left_broadcast=True, right_broadcast=True, numpy_to_regular=False, regular_to_jagged=False, return_value='simplified', expect_return_value=False, highlevel=True, behavior=None, attrs=None)


    :param transformation: Function to apply to each node of the array.
                       See below for details.
    :type transformation: callable
    :param array: Array-like data (anything :py:obj:`ak.to_layout` recognizes), but not an
              :py:obj:`ak.Record` or :py:obj:`ak.record.Record`.
    :param more_arrays: Additional arrays to be broadcasted together (with first ``array``)
                    and used together in the transformation. See below for details.
    :param depth_context: User data to propagate through the transformation.
                      New data added to ``depth_context`` is available to the entire *subtree*
                      at which it is added, but no other *subtrees*. For example, data added
                      during the transformation will not be in the original ``depth_context``
                      after the transformation.
    :type depth_context: None or dict
    :param lateral_context: User data to propagate through the transformation.
                        New data added to ``lateral_context`` is available at any later step of
                        the depth-first walk over the tree, including *other subtrees*. For
                        example, data added during the transformation will be in the original
                        ``lateral_context`` after the transformation.
    :type lateral_context: None or dict
    :param allow_records: If False and the recursive walk encounters any
                      :py:obj:`ak.contents.RecordArray` nodes, an error is raised.
    :type allow_records: bool
    :param broadcast_parameters_rule: Rule for broadcasting parameters, one of:
                                  - ``"intersect"``
                                  - ``"all_or_nothing"``
                                  - ``"one_to_one"``
                                  - ``"none"``
    :type broadcast_parameters_rule: str
    :param left_broadcast: If ``more_arrays`` are provided, the parameter
                       determines whether the arrays are left-broadcasted, which is
                       Awkward-like broadcasting.
    :type left_broadcast: bool
    :param right_broadcast: If ``more_arrays`` are provided, the parameter
                        determines whether the arrays are right-broadcasted, which is
                        NumPy-like broadcasting.
    :type right_broadcast: bool
    :param numpy_to_regular: If True, multidimensional :py:obj:`ak.contents.NumpyArray`
                         nodes are converted into :py:obj:`ak.contents.RegularArray` nodes before
                         calling ``transformation``.
    :type numpy_to_regular: bool
    :param regular_to_jagged: If True, regular-type lists are converted into
                          variable-length lists before calling ``transformation``.
    :type regular_to_jagged: bool
    :param return_value: this function is None; if ``"original"``, untouched nodes surrounding
                     the ones replaced by the ``transformation`` are returned in their original
                     state; if ``"simplified"``, the :py:obj:`ak.Content.simplified` constructor is
                     used on the surrounding nodes to ensure that option-type and union-type
                     nodes are not nested inappropriately. Note that if ``return_value`` is ``"none"``,
                     the only way to get information out of this function is through the
                     ``lateral_context``.
    :type return_value: ``"none"``, ``"original", ``"simplified"``
    :param expect_return_value: If True, raise a ``RuntimeError`` if the transformer
                            does not terminate the recursion.
    :type expect_return_value: bool
    :param highlevel: If True, return an :py:obj:`ak.Array`; otherwise, return
                  a low-level :py:obj:`ak.contents.Content` subclass.
    :type highlevel: bool
    :param behavior: Custom :py:obj:`ak.behavior` for the output array, if
                 high-level.
    :type behavior: None or dict
    :param attrs: Custom attributes for the output array, if
              high-level.
    :type attrs: None or dict

Applies a ``transformation`` function to every node of an Awkward array or arrays
to either obtain a transformed copy or extract data from a walk over the arrays'
low-level layout nodes.

This is a public interface to the infrastructure that is used to implement most
Awkward Array operations. As such, it's very powerful, but low-level.

Here is a "hello world" example:

.. code-block:: python


    >>> def say_hello(layout, depth, **kwargs):
    ...     print("Hello", type(layout).__name__, "at", depth)
    ...
    >>> array = ak.Array([[1.1, 2.2, "three"], [], None, [4.4, 5.5]])
    >>> ak.transform(say_hello, array, return_value="none")
    Hello IndexedOptionArray at 1
    Hello ListOffsetArray at 1
    Hello UnionArray at 2
    Hello NumpyArray at 2
    Hello ListOffsetArray at 2
    Hello NumpyArray at 3

In the above, ``say_hello`` is called on every node of the ``array``, which has
a lot of nodes because it has nested lists, missing data, and a union of
different types. The data types are low-level "layouts," subclasses of
:py:obj:`ak.contents.Content`, rather than high-level :py:obj:`ak.Array`.

The primary purpose of this function is to allow you to edit one level of
structure without having to worry about what it's embedded in. Suppose, for
instance, you want to apply NumPy's ``np.round`` function to numerical data,
regardless of what lists or other structures they're embedded in.

The return value must be a subclass of :py:obj:`ak.contents.Content` (to replace the
array node) or None (to leave the array node unchanged).

.. code-block:: python


    >>> def rounder(layout, **kwargs):
    ...     if layout.is_numpy:
    ...         return ak.contents.NumpyArray(
    ...             np.round(layout.data).astype(np.int32)
    ...         )
    ...
    >>> array = ak.Array(
    ... [[[[[1.1, 2.2, 3.3], []], None], []],
    ...  [[[[4.4, 5.5]]]]]
    ... )
    >>> ak.transform(rounder, array).show(type=True)
    type: 2 * var * var * option[var * var * int32]
    [[[[[1, 2, 3], []], None], []],
     [[[[4, 6]]]]]

If you pass multiple arrays to this function (``more_arrays``), those arrays
will be broadcasted and all inputs, at the same level of depth and structure,
will be passed to the ``transformation`` function as a group.

Here is an example with broadcasting:

.. code-block:: python


    >>> def combine(layouts, **kwargs):
    ...     assert len(layouts) == 2
    ...     if layouts[0].is_numpy and layouts[1].is_numpy:
    ...         return ak.contents.NumpyArray(
    ...             layouts[0].data + 10 * layouts[1].data
    ...         )
    ...
    >>> array1 = ak.Array([[1, 2, 3], [], None, [4, 5]])
    >>> array2 = ak.Array([1, 2, 3, 4])
    >>> ak.transform(combine, array1, array2)
    <Array [[11, 12, 13], [], None, [44, 45]] type='4 * option[var * int64]'>

The ``1`` and ``4`` from ``array2`` are broadcasted to the ``[1, 2, 3]`` and the
``[4, 5]`` of ``array1``, and the other elements disappear because they are
broadcasted with an empty list and a missing value. Note that the first argument
of this ``transformation`` function is a *list* of layouts, not a single layout.
There are always 2 layouts because 2 arrays were passed to :py:obj:`ak.transform`.

Signature of the transformation function
========================================

If there is only one array, the first argument of ``transformation`` is a
:py:obj:`ak.contents.Content` instance. If there are multiple arrays (``more_arrays``),
the first argument is a list of :py:obj:`ak.contents.Content` instances.

All other arguments can be absorbed into a ``**kwargs`` because they will always
be passed to your function by keyword. They are

* depth (int): The current list depth, where 1 is the outermost array and
    higher numbers are deeper levels of list nesting. This does not count
    nesting of other data structures, such as option-types and records.
* depth_context (None or dict): Any user-specified data. You can add to
    this dict during transformation; changes would only be seen in the
    subtree's nodes.
* lateral_context (None or dict): Any user-specified data. You can add to
    this dict during transformation; changes would be seen in any node
    visited later in the depth-first search.
* continuation (callable): Zero-argument function that continues the
    recursion from this point in the walk, so that you can perform
    post-processing instead of pre-processing.

For completeness, the following arguments are also passed to ``transformation``,
but you usually won't need them:

* behavior (None or dict): Behavior that would be attached to the output
    array(s) if ``highlevel``.
* backend (array library / kernel library shim): Handle to the NumPy
    library, CuPy, etc., depending on the type of arrays.
* options (dict): Options provided to :py:obj:`ak.transform`.

If there is only one array, the ``transformation`` function must either return
None or return an :py:obj:`ak.contents.Content`.

If there are multiple arrays (``more_arrays``), then the transformation function
may return one array or a tuple of arrays. (The preferred type is a tuple, even
if it has length 1.)

The final return value of :py:obj:`ak.transform` is a new array or tuple of arrays
constructed by replacing nodes when ``transformation`` returns a
:py:obj:`ak.contents.Content` or tuple of :py:obj:`ak.contents.Content`, and leaving
nodes unchanged when ``transformation`` returns None. If ``transformation`` returns
length-1 tuples, the final output is an array, not a length-1 tuple.

If ``return_value`` is ``"none"``, :py:obj:`ak.transform` returns None. This is useful for
functions that return non-array data through ``lateral_context``. The other two
choices, ``"original"`` and ``"simplified"``, determine how untouched array nodes,
the ones that are _not_ modified by the ``transformation`` function, are returned.
With ``"original"``, they are returned without modification, which might result
in illegal combinations of option-type and union-type, which would raise an
error. With ``"simplified"``, the surrounding array nodes are simplified upon
reconstruction. For example, if the ``transformation`` puts a new :py:obj:`ak.contents.ByteMaskedArray`
inside an existing :py:obj:`ak.contents.ByteMaskedArray`, the two will be consolidated
into a single option-type array node.

Contexts
========

The ``depth_context`` and ``lateral_context`` allow you to pass your own data into
the transformation as well as communicate between calls of ``transformation`` on
different nodes. The ``depth_context`` limits this communication to descendants
of the subtree in which the data were added; ``lateral_context`` does not have
this limit. (``depth_context`` is shallow-copied at each node during descent;
``lateral_context`` is never copied.)

For example, consider this array:

.. code-block:: python


    >>> array = ak.Array([
    ...     [{"x": [1], "y": 1.1}, {"x": [1, 2], "y": 2.2}, {"x": [1, 2, 3], "y": 3.3}],
    ...     [],
    ...     [{"x": [1, 2, 3, 4], "y": 4.4}, {"x": [1, 2, 3, 4, 5], "y": 5.5}],
    ... ])

If we accumulate node type names using ``depth_context``,

.. code-block:: python


    >>> def crawl(layout, depth_context, **kwargs):
    ...     depth_context["types"] = depth_context["types"] + (type(layout).__name__,)
    ...     print(depth_context["types"])
    ...
    >>> context = {"types": ()}
    >>> ak.transform(crawl, array, depth_context=context, return_value="none")
    ('ListOffsetArray',)
    ('ListOffsetArray', 'RecordArray')
    ('ListOffsetArray', 'RecordArray', 'ListOffsetArray')
    ('ListOffsetArray', 'RecordArray', 'ListOffsetArray', 'NumpyArray')
    ('ListOffsetArray', 'RecordArray', 'NumpyArray')
    >>> context
    {'types': ()}

The data in ``depth_context["types"]`` represents a path from the root of the
tree to the current node. There is never, for instance, more than one leaf-type
(:py:obj:`ak.contents.NumpyArray`) in the tuple. Also, the ``context`` is unchanged
outside of the function.

On the other hand, if we do the same with a ``lateral_context``,

.. code-block:: python


    >>> def crawl(layout, lateral_context, **kwargs):
    ...     lateral_context["types"] = lateral_context["types"] + (type(layout).__name__,)
    ...     print(lateral_context["types"])
    ...
    >>> context = {"types": ()}
    >>> ak.transform(crawl, array, lateral_context=context, return_value="none")
    ('ListOffsetArray',)
    ('ListOffsetArray', 'RecordArray')
    ('ListOffsetArray', 'RecordArray', 'ListOffsetArray')
    ('ListOffsetArray', 'RecordArray', 'ListOffsetArray', 'NumpyArray')
    ('ListOffsetArray', 'RecordArray', 'ListOffsetArray', 'NumpyArray', 'NumpyArray')
    >>> context
    {'types': ('ListOffsetArray', 'RecordArray', 'ListOffsetArray', 'NumpyArray', 'NumpyArray')}

The data accumulate through the walk over the tree. There are two leaf-types
(:py:obj:`ak.contents.NumpyArray`) in the tuple because this tree has two leaves.
The data are even available outside of the function, so ``lateral_context`` can
be paired with ``return_value="none"`` to extract non-array data, rather than
transforming the array.

The visitation order is stable: a recursive walk always proceeds through the
same tree in the same order.

Continuation
============

The ``transformation`` function is given an input, untransformed layout or layouts.
Some algorithms need to perform a correction on transformed outputs, so
``continuation()`` can be called at any point to continue descending but obtain
the transformed result.

For example, this function inserts an option-type at every level of an array:

.. code-block:: python


    >>> def insert_optiontype(layout, continuation, **kwargs):
    ...     return ak.contents.UnmaskedArray(continuation())
    ...
    >>> array = ak.Array([[[[[1.1, 2.2, 3.3], []]], []], [[[[4.4, 5.5]]]]])
    >>> array.type.show()
    2 * var * var * var * var * float64

    >>> array2 = ak.transform(insert_optiontype, array)
    >>> array2.type.show()
    2 * option[var * option[var * option[var * option[var * ?float64]]]]

In the original array, every node is a :py:obj:`ak.contents.ListOffsetArray` except
the leaf, which is a :py:obj:`ak.contents.NumpyArray`. The call to ``continuation()``
returns a :py:obj:`ak.contents.ListOffsetArray` with its contents transformed, which
is the argument of a new :py:obj:`ak.contents.UnmaskedArray`.

To see this process as it happens, we can add ``print`` statements to the function.

.. code-block:: python


    >>> def insert_optiontype(input, continuation, **kwargs):
    ...     print("before", input.form.type)
    ...     output = ak.contents.UnmaskedArray(continuation())
    ...     print("after ", output.form.type)
    ...     return output
    ...
    >>> ak.transform(insert_optiontype, array)
    before var * var * var * var * float64
    before var * var * var * float64
    before var * var * float64
    before var * float64
    before float64
    after  ?float64
    after  option[var * ?float64]
    after  option[var * option[var * ?float64]]
    after  option[var * option[var * option[var * ?float64]]]
    after  option[var * option[var * option[var * option[var * ?float64]]]]
    <Array [[[[[1.1, ..., 3.3], ...]], ...], ...] type='2 * option[var * option...'>

Broadcasting
============

When multiple arrays are provided (``more_arrays``), all of the arrays are
broadcasted during the walk so that the ``transformation`` function is eventually
provided with a list of layouts that have compatible types (for mathematical
operations, etc.).

For instance, given these two arrays:

.. code-block:: python


    >>> array1 = ak.Array([[1, 2, 3], [], None, [4, 5]])
    >>> array2 = ak.Array([10, 20, 30, 40])

The following single-array function shows the nodes encountered when walking
down either one of them.

.. code-block:: python


    >>> def one_array(layout, **kwargs):
    ...     print(type(layout).__name__)
    ...
    >>> ak.transform(one_array, array1, return_value="none")
    IndexedOptionArray
    ListOffsetArray
    NumpyArray
    >>> ak.transform(one_array, array2, return_value="none")
    NumpyArray

The first array has three nested nodes; the second has only one node.

However, when the following two-array function is applied,

.. code-block:: python


    >>> def two_arrays(layouts, **kwargs):
    ...     assert len(layouts) == 2
    ...     print(type(layouts[0]).__name__, ak.to_list(layouts[0]))
    ...     print(type(layouts[1]).__name__, ak.to_list(layouts[1]))
    ...     print()
    ...
    >>> ak.transform(two_arrays, array1, array2)
    RegularArray [[[1, 2, 3], [], None, [4, 5]]]
    RegularArray [[10, 20, 30, 40]]

    IndexedOptionArray [[1, 2, 3], [], None, [4, 5]]
    NumpyArray [10, 20, 30, 40]

    ListArray [[1, 2, 3], [], [4, 5]]
    NumpyArray [10, 20, 40]

    NumpyArray [1, 2, 3, 4, 5]
    NumpyArray [10, 10, 10, 40, 40]

    (<Array [[1, 2, 3], [], None, [4, 5]] type='4 * option[var * int64]'>,
     <Array [[10, 10, 10], [], None, [40, 40]] type='4 * option[var * int64]'>)

The incompatible types of the two arrays eventually becomes the same type by
duplicating and removing values wherever necessary. If you cannot perform an
operation on a :py:obj:`ak.contents.ListArray` and a :py:obj:`ak.contents.NumpyArray`,
wait for a later iteration, in which both will be :py:obj:`ak.contents.NumpyArray`
(if the original arrays are broadcastable).

The return value, without transformation, is the same as what
:py:obj:`ak.broadcast_arrays` would return. See :py:obj:`ak.broadcast_arrays` for an
explanation of ``left_broadcast`` and ``right_broadcast``.

Broadcasting Parameters
=======================

When broadcasting multiple arrays with parameters, there are different ways of
assigning parameters to the outputs. The assignment of array parameters happens
at every level above the transformation action.

The method of parameter assignment used by the broadcasting routine is controlled
by the ``broadcast_parameters_rule`` option, which can take one of the following
values:

``"intersect"``
    The parameters of each output array will correspond to the intersection
    of the parameters from each of the input arrays.

``"all_or_nothing"``
    If the parameters of the input arrays are all equal, then they will be used
    for each output array. Otherwise, the output arrays will not be given
    parameters.

``"one_to_one"``
    If the number of output arrays matches the number of input arrays, then the
    output arrays are given the parameters of the input arrays. Otherwise, a
    ValueError is raised.

``"none"``
    The output arrays will not be given parameters.


Performance Tip
================

:py:obj:`ak.transform` will traverse the layout of (potentially multiple) arrays once.
This can be useful if one wants to apply a batch of transformations in one single
layout traversal. Traversing the layout multiple times can be inefficient.

Consider the following example:

.. code-block:: python


    >>> def batch_of_operations(array):
    ...     return np.sqrt(np.sin(array) + 1) - 1
    ...
    >>> def apply_batch_of_operations(layout, **kwargs):
    ...     if layout.is_numpy:
    ...         return ak.contents.NumpyArray(
    ...             batch_of_operations(layout.data)
    ...         )
    ...
    >>> array = ak.Array(
    ... [[[[[1.1, 2.2, 3.3], []], None], []],
    ...  [[[[4.4, 5.5]]]]]
    ... )
    >>> %timeit ak.transform(apply_batch_of_operations, array)
    ... 68.5 μs ± 663 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
    >>> %timeit batch_of_operations(array)
    ... 1.07 ms ± 39.1 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

The first ``%timeit`` cell shows the time it takes to apply the batch of operations using :py:obj:`ak.transform`,
which allows to apply the operations in one single traversal of the layout. The second ``%timeit`` cell shows
the runtime of applying the operations directly to the array, which traverses the layout multiple times.
To be more explicit: one layout traversal for each operation.


See also: :py:obj:`ak.is_valid` and :py:obj:`ak.valid_when` to check the validity of transformed
outputs.