xrspatial.classify.natural_breaks#

xrspatial.classify.natural_breaks(agg: DataArray, k: int = 5, num_sample: int | None = 20000, name: str | None = 'natural_breaks') DataArray[source]#

Reclassifies data for array agg into new values based on Natural Breaks or K-Means clustering method. Values are grouped so that similar values are placed in the same group and space between groups is maximized.

Parameters:
  • agg (xr.DataArray or xr.Dataset) – 2D NumPy, CuPy, NumPy-backed Dask, or CuPy-backed Dask array of values to be reclassified.

  • k (int, default=5) – Number of classes to be produced.

  • num_sample (int or None, default=20000) – Number of sample data points used to fit the model. Natural Breaks (Jenks) classification is indeed O(n²) complexity, where n is the total number of data points, i.e: agg.size When n is large, we should fit the model on a small sub-sample of the data instead of using the whole dataset. None means fit on all data instead of a sub-sample. That is the full O(n²) case described above, so it may be slow and raises MemoryError if the Jenks matrices would exceed half of the available RAM. For dask the full sample is drawn lazily via indexed access.

  • name (str, default='natural_breaks') – Name of output aggregate.

Returns:

natural_breaks_agg – 2D aggregate array of natural break allocations. All other input attributes are preserved. If agg is a Dataset, returns a Dataset with each variable classified independently.

Return type:

xr.DataArray or xr.Dataset

References

Examples

natural_breaks() works with numpy backed xarray DataArray. .. sourcecode:: python

>>> import numpy as np
>>> import xarray as xr
>>> from xrspatial.classify import natural_breaks
>>> elevation = np.array([
    [np.nan,  1.,  2.,  3.,  4.],
    [ 5.,  6.,  7.,  8.,  9.],
    [10., 11., 12., 13., 14.],
    [15., 16., 17., 18., 19.],
    [20., 21., 22., 23., np.inf]
])
>>> agg_numpy = xr.DataArray(elevation, attrs={'res': (10.0, 10.0)})
>>> numpy_natural_breaks = natural_breaks(agg_numpy, k=5)
>>> print(numpy_natural_breaks)
<xarray.DataArray 'natural_breaks' (dim_0: 5, dim_1: 5)>
array([[nan,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  2.],
       [ 2.,  2.,  2.,  2.,  3.],
       [ 3.,  3.,  3.,  3.,  4.],
       [ 4.,  4.,  4.,  4., nan]], dtype=float32)
Dimensions without coordinates: dim_0, dim_1
Attributes:
    res:      (10.0, 10.0)

natural_breaks() works with cupy backed xarray DataArray. .. sourcecode:: python

>>> import cupy
>>> agg_cupy = xr.DataArray(cupy.asarray(elevation))
>>> cupy_natural_breaks = natural_breaks(agg_cupy)
>>> print(type(cupy_natural_breaks))
<class 'xarray.core.dataarray.DataArray'>
>>> print(cupy_natural_breaks)
<xarray.DataArray 'natural_breaks' (dim_0: 5, dim_1: 5)>
array([[nan,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  2.],
       [ 2.,  2.,  2.,  2.,  3.],
       [ 3.,  3.,  3.,  3.,  4.],
       [ 4.,  4.,  4.,  4., nan]], dtype=float32)
Dimensions without coordinates: dim_0, dim_1