Dask backend behavior#

When you pass a dask-backed DataArray to an xarray-spatial function, the result should also be dask-backed so your pipeline stays lazy until you call .compute(). Most functions do this, but some algorithms need random access to the full array and have to materialize intermediate results.

This page lists every public function and its laziness level so you can plan dask pipelines without reading source code.

Laziness levels#

Fully lazy – the function returns a dask array without triggering any computation. Safe for arbitrarily large out-of-core datasets.

Partially lazy – the function computes small bounded statistics (scalars, quartiles, a ~20K sample) during setup, then returns a dask array for the main result. The statistics are cheap; the heavy work stays lazy.

Fully materialized – the algorithm needs the entire array in memory (connected-component labeling, A* search, viewshed sweepline, etc.). The result may be re-wrapped as dask, but the function calls .compute() internally. Watch your memory on large inputs.

Terrain metrics#

Function

Laziness

Notes

slope

Fully lazy

map_overlap, planar and geodesic

aspect

Fully lazy

map_overlap, planar and geodesic

curvature

Fully lazy

map_overlap

hillshade

Fully lazy

map_overlap

northness

Fully lazy

Uses da.cos / da.deg2rad on aspect output

eastness

Fully lazy

Uses da.sin / da.deg2rad on aspect output

Focal operations#

Function

Laziness

Notes

mean

Fully lazy

Iterative map_overlap

apply

Fully lazy

map_overlap with user kernel

focal_stats

Fully lazy

Multiple stats via map_overlap, 3D output

hotspots

Fully lazy

Global mean/std/count stay lazy; degenerate-input check fires at compute

Classification#

Function

Laziness

Notes

binary

Fully lazy

map_blocks

reclassify

Fully lazy

map_blocks

quantile

Partially lazy

Computes percentiles from ~20K sample

natural_breaks

Partially lazy

Computes Jenks breaks from ~20K sample + scalar max

equal_interval

Partially lazy

Computes scalar min/max

std_mean

Partially lazy

Computes scalar mean/std/max

head_tail_breaks

Partially lazy

Computes O(log N) scalar means

percentiles

Partially lazy

Computes percentiles from ~20K sample

maximum_breaks

Partially lazy

Computes breaks from ~20K sample

box_plot

Partially lazy

Computes scalar quartiles and max

Normalization#

Function

Laziness

Notes

rescale

Fully lazy

da.nanmin / da.nanmax (lazy reductions)

standardize

Fully lazy

da.nanmean / da.nanstd (lazy reductions)

Visibility#

Function

Laziness

Notes

viewshed

Fully materialized

Sweepline needs random access. Grids that exceed memory (no max_distance) use an approximate out-of-core sweep that does not match the exact result and emits a UserWarning; set max_distance for exact results.

line_of_sight

Fully materialized

Extracts 1D transect via .compute()

cumulative_viewshed

Fully materialized

Runs multiple viewshed calls

visibility_frequency

Fully materialized

Wraps cumulative_viewshed

Morphology#

Function

Laziness

Notes

sieve

Fully materialized

Connected-component labeling needs the full array; result re-wrapped as dask

Proximity#

Function

Laziness

Notes

proximity

Fully materialized

Distance computation needs full array

allocation

Fully materialized

Nearest-source allocation

direction

Fully materialized

Direction to nearest source

Zonal#

Function

Laziness

Notes

zonal_stats / stats

Partially lazy

Groupby aggregation via dask dataframe

zonal_crosstab / crosstab

Partially lazy

Groupby cross-tabulation

zonal_apply / apply

Fully lazy

map_blocks per zone

regions

Fully materialized

Connected-component labeling

trim

Fully lazy

Lazy slicing

crop

Fully lazy

Lazy slicing

Pathfinding#

Function

Laziness

Notes

a_star_search

Fully materialized

A* needs random access and visited-set tracking

multi_stop_search

Fully materialized

Iterative A*

Vector conversion#

Function

Laziness

Notes

polygonize

Fully materialized

Polygonizes each chunk, then merges polygons across chunk edges

polygonize runs per chunk and stitches the results, so it never holds the whole raster in memory at once. Chunks are polygonized in batches: each batch is one dask.compute call, so dask schedules the batch in parallel instead of computing one chunk at a time. Peak memory is bounded by one batch worth of per-chunk polygons plus the boundary polygons that accumulate for the merge. Larger inputs with many small chunks parallelize well; rasters whose chunks each produce many polygons use more memory per batch.