ak.str.to_categorical
---------------------

.. py:module: ak.str.to_categorical

Defined in `awkward.operations.str.akstr_to_categorical <https://github.com/scikit-hep/awkward/blob/36da52cfa8846355c390beb6555eac1d31c27c26/src/awkward/operations/str/akstr_to_categorical.py>`__ on `line 13 <https://github.com/scikit-hep/awkward/blob/36da52cfa8846355c390beb6555eac1d31c27c26/src/awkward/operations/str/akstr_to_categorical.py#L13>`__.

.. py:function:: ak.str.to_categorical(array, *, highlevel=True, behavior=None, attrs=None)


    :param array: Array-like data (anything :py:obj:`ak.to_layout` recognizes).
    :param highlevel: If True, return an :py:obj:`ak.Array`; otherwise, return
                  a low-level :py:obj:`ak.contents.Content` subclass.
    :type highlevel: bool
    :param behavior: Custom :py:obj:`ak.behavior` for the output array, if
                 high-level.
    :type behavior: None or dict
    :param attrs: Custom attributes for the output array, if
              high-level.
    :type attrs: None or dict

Returns a dictionary-encoded version of the given array of strings.
Creates a categorical dataset, which has the following properties:

   * only distinct values (categories) are stored in their entirety,
   * pointers to those distinct values are represented by integers
     (an :py:obj:`ak.contents.IndexedArray` or :py:obj:`ak.contents.IndexedOptionArray`
     labeled with parameter ``"__array__" = "categorical"``.

This is equivalent to R's "factor", and Pandas's "categorical".
It differs from generic uses of :py:obj:`ak.contents.IndexedArray` and
:py:obj:`ak.contents.IndexedOptionArray` in Awkward Arrays by the guarantee of no
duplicate categories and the ``"categorical"`` parameter.


Unlike Arrow's ``dictionary_encode``, this function has no ``null_handling``
argument. This function's behavior is like``null_handling="mask"`` (Arrow's default).
It is not possible to encode null values in Awkward Array, as :py:obj:`ak.contents.IndexedOptionArray`
cannot contain an option type node.

Note: this function does not raise an error if the ``array`` does not
contain any string or bytestring data.

Requires the pyarrow library and calls
`pyarrow.compute.dictionary_encode <https://arrow.apache.org/docs/python/generated/pyarrow.compute.dictionary_encode.html>`__.