Dask backend behavior#
When you pass a dask-backed DataArray to an xarray-spatial function, the
result should also be dask-backed so your pipeline stays lazy until you call
.compute(). Most functions do this, but some algorithms need random access
to the full array and have to materialize intermediate results.
This page lists every public function and its laziness level so you can plan dask pipelines without reading source code.
Laziness levels#
Fully lazy – the function returns a dask array without triggering any computation. Safe for arbitrarily large out-of-core datasets.
Partially lazy – the function computes small bounded statistics (scalars, quartiles, a ~20K sample) during setup, then returns a dask array for the main result. The statistics are cheap; the heavy work stays lazy.
Fully materialized – the algorithm needs the entire array in memory
(connected-component labeling, A* search, viewshed sweepline, etc.). The
result may be re-wrapped as dask, but the function calls .compute()
internally. Watch your memory on large inputs.
Terrain metrics#
Function |
Laziness |
Notes |
|---|---|---|
|
Fully lazy |
|
|
Fully lazy |
|
|
Fully lazy |
|
|
Fully lazy |
|
|
Fully lazy |
Uses |
|
Fully lazy |
Uses |
Focal operations#
Function |
Laziness |
Notes |
|---|---|---|
|
Fully lazy |
Iterative |
|
Fully lazy |
|
|
Fully lazy |
Multiple stats via |
|
Fully lazy |
Global mean/std/count stay lazy; degenerate-input check fires at compute |
Classification#
Function |
Laziness |
Notes |
|---|---|---|
|
Fully lazy |
|
|
Fully lazy |
|
|
Partially lazy |
Computes percentiles from ~20K sample |
|
Partially lazy |
Computes Jenks breaks from ~20K sample + scalar max |
|
Partially lazy |
Computes scalar min/max |
|
Partially lazy |
Computes scalar mean/std/max |
|
Partially lazy |
Computes O(log N) scalar means |
|
Partially lazy |
Computes percentiles from ~20K sample |
|
Partially lazy |
Computes breaks from ~20K sample |
|
Partially lazy |
Computes scalar quartiles and max |
Normalization#
Function |
Laziness |
Notes |
|---|---|---|
|
Fully lazy |
|
|
Fully lazy |
|
Visibility#
Function |
Laziness |
Notes |
|---|---|---|
|
Fully materialized |
Sweepline needs random access. Grids that exceed memory (no
|
|
Fully materialized |
Extracts 1D transect via |
|
Fully materialized |
Runs multiple viewshed calls |
|
Fully materialized |
Wraps |
Morphology#
Function |
Laziness |
Notes |
|---|---|---|
|
Fully materialized |
Connected-component labeling needs the full array; result re-wrapped as dask |
Proximity#
Function |
Laziness |
Notes |
|---|---|---|
|
Fully materialized |
Distance computation needs full array |
|
Fully materialized |
Nearest-source allocation |
|
Fully materialized |
Direction to nearest source |
Zonal#
Function |
Laziness |
Notes |
|---|---|---|
|
Partially lazy |
Groupby aggregation via dask dataframe |
|
Partially lazy |
Groupby cross-tabulation |
|
Fully lazy |
|
|
Fully materialized |
Connected-component labeling |
|
Fully lazy |
Lazy slicing |
|
Fully lazy |
Lazy slicing |
Pathfinding#
Function |
Laziness |
Notes |
|---|---|---|
|
Fully materialized |
A* needs random access and visited-set tracking |
|
Fully materialized |
Iterative A* |
Vector conversion#
Function |
Laziness |
Notes |
|---|---|---|
|
Fully materialized |
Polygonizes each chunk, then merges polygons across chunk edges |
polygonize runs per chunk and stitches the results, so it never holds the
whole raster in memory at once. Chunks are polygonized in batches: each batch
is one dask.compute call, so dask schedules the batch in parallel instead
of computing one chunk at a time. Peak memory is bounded by one batch worth of
per-chunk polygons plus the boundary polygons that accumulate for the merge.
Larger inputs with many small chunks parallelize well; rasters whose chunks each
produce many polygons use more memory per batch.