xrspatial.utils.rechunk_no_shuffle#

xrspatial.utils.rechunk_no_shuffle(agg, target_mb=128)[source]#

Rechunk a dask-backed DataArray or Dataset without triggering a shuffle.

Computes an integer multiplier per dimension so that each new chunk is an exact multiple of the original chunk size. This lets dask merge whole source chunks in-place instead of splitting and recombining partial blocks (which is effectively a shuffle).

Parameters:
  • agg (xr.DataArray or xr.Dataset) – Input raster(s). If not backed by a dask array the input is returned unchanged. For Datasets, each variable is rechunked independently.

  • target_mb (int or float) – Target chunk size in megabytes. The actual chunk size will be the closest multiple of the source chunk that does not exceed this target. Default 128.

Returns:

Rechunked object. Coordinates and attributes are preserved.

Return type:

xr.DataArray or xr.Dataset

Raises:
  • TypeError – If agg is not an xr.DataArray or xr.Dataset.

  • ValueError – If target_mb is not positive.

Examples

>>> import dask.array as da
>>> import xarray as xr
>>> arr = xr.DataArray(da.zeros((4096, 4096), chunks=256))
>>> big = rechunk_no_shuffle(arr, target_mb=64)
>>> big.chunks  # multiples of 256