R/marrrow.R
marrow.Rd
map + arrow: iterate over a function and collate the results into an Arrow dataset. This happens without the whole dataset being in memory, so is suitable for large data objects. The function must return a data.frame or tibble. The returned value is a path to the directory containing the Arrow dataset.
marrow_dir(.x, .f, ..., .path, .partitioning = c(), .format = "parquet") marrow_ds(.x, .f, ..., .path, .partitioning = c(), .format = "parquet") marrow_files(.x, .f, ..., .path, .partitioning = c(), .format = "parquet")
.x | vector or list of values for .f to iterate over |
---|---|
.f | function; must return a data.frame/tibble |
... | other arguments to .f |
.path | path to directory where collated Arrow dataset will be stored. will be created if it does not exist |
.partitioning | character vector of columns to use for partitioning. Columns must exist in output of .f. |
.format | "parquet" (the default) or "arrow". |
path to new dataset directory; character string of length one.
an Arrow Dataset
character vector containing paths to all files in dataset dir
marrow_dir
: Return path to directory containing dataset
marrow_ds
: Return Arrow Dataset
marrow_files
: Return paths to all files in dataset dir