xrspatial.polygonize.polygonize#

xrspatial.polygonize.polygonize(raster: DataArray, mask: DataArray | None = None, connectivity: int = 4, transform: ndarray | None = None, column_name: str = 'DN', return_type: str = 'numpy', simplify_tolerance: float | None = None, simplify_method: str = 'douglas-peucker')[source]#

Polygonize creates vector polygons for connected regions of pixels in a raster that share the same pixel value. It is a raster to vector converter.

Parameters:
  • raster (xr.DataArray) – Input raster.

  • mask (xr.DataArray, optional) – Optional input mask. Pixels to include should have mask values of 1 or True, pixels to exclude should have 0 or False. This is the opposite of a NumPy mask.

  • connectivity (int, default=4) –

    Whether to use 4-connectivity (adjacent along long edge only) or 8-connectivity (adjacent along long edge or diagonal) to determine which pixels are connected. Connectivity of 4 returns valid polygons (by shapely’s definition) provided both x and y are monotonically increasing or decreasing. Connectivity of 8 does not necessarily return valid polygons.

    Note: when using Dask arrays, 8-connectivity may produce extra polygon splits at chunk corners where diagonal-only adjacency crosses a chunk boundary. 4-connectivity works perfectly with Dask chunking.

  • transform (ndarray, optional) – Optional affine transform to apply to return polygon coordinates.

  • column_name (str, default="DN") – Name to use for column returned. Only used if return_type is “geopandas” or “spatialpandas”.

  • return_type (str, default="numpy") – Format of returned data. Allowed values are “numpy”, “spatialpandas”, “geopandas”, “awkward” and “geojson”. “numpy” and “geojson” are always available, the others require optional dependencies.

  • simplify_tolerance (float, optional) –

    Simplification tolerance in coordinate units. When set, polygon boundaries are simplified using shared-edge decomposition to preserve topology between adjacent polygons. Default is None (no simplification).

    For "douglas-peucker", this is the maximum perpendicular distance a vertex may deviate from the simplified line.

    For "visvalingam-whyatt", this is the minimum triangle area threshold; vertices forming triangles smaller than this are removed.

  • simplify_method (str, default="douglas-peucker") – Simplification algorithm. Options are "douglas-peucker" (distance-based, good for general use) and "visvalingam-whyatt" (area-based, tends to produce better cartographic results).

Returns:

  • Polygons and their corresponding values in a format determined by

  • return_type.

Notes

CuPy and Dask+CuPy arrays are accepted as input. Data is transferred to CPU for processing because boundary tracing is an inherently sequential graph traversal (each step depends on the previous turn direction), preventing GPU parallelism. Output is always CPU-side numpy coordinate arrays regardless of input type.

For Dask+CuPy, each chunk is transferred independently, keeping peak CPU memory proportional to chunk size rather than full raster size.