ak.str.to_categorical#
Defined in awkward.operations.str.akstr_to_categorical on line 13.
- ak.str.to_categorical(array, *, highlevel=True, behavior=None, attrs=None)#
- Parameters:
array – Array-like data (anything
ak.to_layout
recognizes).highlevel (bool) – If True, return an
ak.Array
; otherwise, return a low-levelak.contents.Content
subclass.behavior (None or dict) – Custom
ak.behavior
for the output array, if high-level.attrs (None or dict) – Custom attributes for the output array, if high-level.
Returns a dictionary-encoded version of the given array of strings. Creates a categorical dataset, which has the following properties:
only distinct values (categories) are stored in their entirety,
pointers to those distinct values are represented by integers (an
ak.contents.IndexedArray
orak.contents.IndexedOptionArray
labeled with parameter"__array__" = "categorical"
.
This is equivalent to R’s “factor”, and Pandas’s “categorical”.
It differs from generic uses of ak.contents.IndexedArray
and
ak.contents.IndexedOptionArray
in Awkward Arrays by the guarantee of no
duplicate categories and the "categorical"
parameter.
Unlike Arrow’s dictionary_encode
, this function has no null_handling
argument. This function’s behavior is like``null_handling=”mask”`` (Arrow’s default).
It is not possible to encode null values in Awkward Array, as ak.contents.IndexedOptionArray
cannot contain an option type node.
Note: this function does not raise an error if the array
does not
contain any string or bytestring data.
Requires the pyarrow library and calls pyarrow.compute.dictionary_encode.