ak.from_parquet#

Defined in awkward.operations.ak_from_parquet on line 17.

ak.from_parquet(path, *, columns=None, row_groups=None, storage_options=None, max_gap=64000, max_block=256000000, footer_sample_size=1000000, generate_bitmasks=False, highlevel=True, behavior=None, attrs=None)#
Parameters:
  • path (str) – Local filename or remote URL, passed to fsspec for resolution. May contain glob patterns.

  • columns (None, str, or iterable of (str or iterable of str)) – Glob pattern(s) including bash-like curly brackets for matching column names. Nested records are separated by dots. If a list of patterns, the logical-or is matched. If None, all columns are read. A list of lists can be provided to select columns with literal dots in their names – The inner list provides column names or patterns.

  • row_groups (None or set of int) – Row groups to read; must be non-negative. Order is ignored: the output array is presented in the order specified by Parquet metadata. If None, all row groups/all rows are read.

  • storage_options – Passed to fsspec.parquet.open_parquet_file.

  • max_gap (int) – Passed to fsspec.parquet.open_parquet_file.

  • max_block (int) – Passed to fsspec.parquet.open_parquet_file.

  • footer_sample_size (int) – Passed to fsspec.parquet.open_parquet_file.

  • generate_bitmasks (bool) – If enabled and Arrow/Parquet does not have Awkward metadata, generate_bitmasks=True creates empty bitmasks for nullable types that don’t have bitmasks in the Arrow/Parquet data, so that the Form (BitMaskedForm vs UnmaskedForm) is predictable.

  • highlevel (bool) – If True, return an ak.Array; otherwise, return a low-level ak.contents.Content subclass.

  • behavior (None or dict) – Custom ak.behavior for the output array, if high-level.

  • attrs (None or dict) – Custom attributes for the output array, if high-level.

Reads data from a local or remote Parquet file or collection of files.

The data are eagerly (not lazily) read and must fit into memory. Use columns and/or row_groups to select and filter manageable subsets of the data, and use ak.metadata_from_parquet to find column names and the range of row groups that a dataset has.

See also ak.to_parquet, ak.metadata_from_parquet.