xrspatial.sieve.sieve#
- xrspatial.sieve.sieve(raster: DataArray, threshold: int = 10, neighborhood: int = 4, skip_values: Sequence[float] | None = None, name: str = 'sieve') DataArray[source]#
Remove small connected regions from a classified raster.
Identifies connected components of same-value pixels and replaces regions smaller than threshold pixels with the value of their largest spatial neighbor that is already at or above threshold. Regions whose only neighbors are also below threshold are left unchanged, matching GDAL’s single-pass semantics. NaN pixels are always preserved.
- Parameters:
raster (xr.DataArray) – 2D classified or categorical raster.
threshold (int, default=10) – Minimum region size in pixels. Regions with fewer pixels are replaced by their largest neighbor’s value.
neighborhood (int, default=4) – Pixel connectivity: 4 (rook) or 8 (queen).
skip_values (sequence of float, optional) – Category values whose regions are never replaced, regardless of size. These regions can still serve as merge targets for neighboring small regions.
name (str, default='sieve') – Output DataArray name.
- Returns:
Sieved raster with the same shape, dims, coords, and attrs.
- Return type:
xr.DataArray
Examples
>>> import numpy as np >>> import xarray as xr >>> from xrspatial.sieve import sieve >>> # Classified raster with salt-and-pepper noise >>> arr = np.array([[1, 1, 1, 2, 2], ... [1, 3, 1, 2, 2], ... [1, 1, 1, 2, 2], ... [2, 2, 2, 2, 2], ... [2, 2, 2, 2, 2]], dtype=np.float64) >>> raster = xr.DataArray(arr, dims=['y', 'x']) >>> # Remove regions smaller than 2 pixels >>> result = sieve(raster, threshold=2) >>> print(result.values) [[1. 1. 1. 2. 2.] [1. 1. 1. 2. 2.] [1. 1. 1. 2. 2.] [2. 2. 2. 2. 2.] [2. 2. 2. 2. 2.]]
Notes
Uses single-pass semantics matching GDAL’s
GDALSieveFilter. A small region is only merged into a neighbor whose current size is >= threshold. If no such neighbor exists the region is left unchanged.This is a global operation: for dask-backed arrays the entire raster is computed into memory before sieving. Connected-component labeling cannot be performed on individual chunks because regions may span chunk boundaries.
The CuPy backends use a CPU fallback for the merge step, which is inherently serial.
See also
xrspatial.zonal.regionsConnected-component labeling.
xrspatial.classify.natural_breaksClassification that may produce noisy output suitable for sieving.