xrspatial.utils.rechunk_no_shuffle#
- xrspatial.utils.rechunk_no_shuffle(agg, target_mb=128)[source]#
Rechunk a dask-backed DataArray or Dataset without triggering a shuffle.
Computes an integer multiplier per dimension so that each new chunk is an exact multiple of the original chunk size. This lets dask merge whole source chunks in-place instead of splitting and recombining partial blocks (which is effectively a shuffle).
- Parameters:
agg (xr.DataArray or xr.Dataset) – Input raster(s). If not backed by a dask array the input is returned unchanged. For Datasets, each variable is rechunked independently.
target_mb (int or float) – Target chunk size in megabytes. The actual chunk size will be the closest multiple of the source chunk that does not exceed this target. Default 128.
- Returns:
Rechunked object. Coordinates and attributes are preserved.
- Return type:
xr.DataArray or xr.Dataset
- Raises:
TypeError – If agg is not an
xr.DataArrayorxr.Dataset.ValueError – If target_mb is not positive.
Examples
>>> import dask.array as da >>> import xarray as xr >>> arr = xr.DataArray(da.zeros((4096, 4096), chunks=256)) >>> big = rechunk_no_shuffle(arr, target_mb=64) >>> big.chunks # multiples of 256