xrspatial.classify.quantile#

xrspatial.classify.quantile(agg: DataArray, k: int = 4, num_sample: int | None = 20000, name: str | None = 'quantile') → DataArray[source]#

Reclassifies data for array agg into new values based on quantile groups of equal size.

Parameters:

agg (xr.DataArray or xr.Dataset) – 2D NumPy, CuPy, NumPy-backed Dask, or Cupy-backed Dask array of values to be reclassified.
k (int, default=4) – Number of quantiles to be produced.
num_sample (int or None, default=20000) – Number of sample data points used to compute percentile breakpoints. For dask-backed arrays the sample is drawn lazily to avoid materialising the entire array into RAM. None means use all data (safe for numpy/cupy, automatically capped for dask).
name (str, default='quantile') – Name of the output aggregate array.

Returns:

quantile_agg – 2D aggregate array, of quantile allocations. All other input attributes are preserved. If agg is a Dataset, returns a Dataset with each variable classified independently.

Return type:

xr.DataArray or xr.Dataset

Notes

Dask’s percentile algorithm is approximate, while numpy’s is exact.
This may cause some differences between results of vanilla numpy

and dask version of the input agg. (dask/dask#3099) # noqa

References

PySAL: https://pysal.org/mapclassify/_modules/mapclassify/classifiers.html#Quantiles # noqa

Examples

Quantile work with numpy backed xarray DataArray .. sourcecode:: python

>>> import numpy as np
>>> import xarray as xr
>>> from xrspatial.classify import quantile

>>> elevation = np.array([
    [np.nan,  1.,  2.,  3.,  4.],
    [ 5.,  6.,  7.,  8.,  9.],
    [10., 11., 12., 13., 14.],
    [15., 16., 17., 18., 19.],
    [20., 21., 22., 23., np.inf]
])
>>> agg_numpy = xr.DataArray(elevation, attrs={'res': (10.0, 10.0)})
>>> numpy_quantile = quantile(agg_numpy, k=5)
>>> print(numpy_quantile)
<xarray.DataArray 'quantile' (dim_0: 5, dim_1: 5)>
array([[nan,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  4.],
       [ 4.,  4.,  4.,  4., nan]], dtype=float32)
Dimensions without coordinates: dim_0, dim_1
Attributes:
    res:      (10.0, 10.0)