ak.from_parquet#
Defined in awkward.operations.ak_from_parquet on line 17.
- ak.from_parquet(path, *, columns=None, row_groups=None, storage_options=None, max_gap=64000, max_block=256000000, footer_sample_size=1000000, generate_bitmasks=False, highlevel=True, behavior=None, attrs=None)#
- Parameters:
path (str) – Local filename or remote URL, passed to fsspec for resolution. May contain glob patterns.
columns (None, str, or iterable of (str or iterable of str)) – Glob pattern(s) including bash-like curly brackets for matching column names. Nested records are separated by dots. If a list of patterns, the logical-or is matched. If None, all columns are read. A list of lists can be provided to select columns with literal dots in their names – The inner list provides column names or patterns.
row_groups (None or set of int) – Row groups to read; must be non-negative. Order is ignored: the output array is presented in the order specified by Parquet metadata. If None, all row groups/all rows are read.
storage_options – Passed to
fsspec.parquet.open_parquet_file
.max_gap (int) – Passed to
fsspec.parquet.open_parquet_file
.max_block (int) – Passed to
fsspec.parquet.open_parquet_file
.footer_sample_size (int) – Passed to
fsspec.parquet.open_parquet_file
.generate_bitmasks (bool) – If enabled and Arrow/Parquet does not have Awkward metadata,
generate_bitmasks=True
creates empty bitmasks for nullable types that don’t have bitmasks in the Arrow/Parquet data, so that the Form (BitMaskedForm vs UnmaskedForm) is predictable.highlevel (bool) – If True, return an
ak.Array
; otherwise, return a low-levelak.contents.Content
subclass.behavior (None or dict) – Custom
ak.behavior
for the output array, if high-level.attrs (None or dict) – Custom attributes for the output array, if high-level.
Reads data from a local or remote Parquet file or collection of files.
The data are eagerly (not lazily) read and must fit into memory. Use columns
and/or row_groups
to select and filter manageable subsets of the data, and
use ak.metadata_from_parquet
to find column names and the range of row groups
that a dataset has.
See also ak.to_parquet
, ak.metadata_from_parquet
.