xrspatial.polygonize.polygonize#

xrspatial.polygonize.polygonize(raster: DataArray, mask: DataArray | None = None, connectivity: int = 4, transform: ndarray | None = None, column_name: str = 'DN', return_type: str = 'numpy', simplify_tolerance: float | None = None, simplify_method: str = 'douglas-peucker', atol: float = 1e-08, rtol: float = 1e-05) Tuple[List[int | float], List[List[ndarray]]] | Tuple[List[int | float], ak.Array] | gpd.GeoDataFrame | spatialpandas.GeoDataFrame | Dict[str, Any][source]#

Polygonize creates vector polygons for connected regions of pixels in a raster that group together by pixel value. It is a raster to vector converter.

For integer rasters, “same value” means strict equality. For float rasters, adjacent pixels are grouped when their values agree within a small numerical tolerance (controlled by atol and rtol), so floating-point noise from upstream arithmetic does not split otherwise identical regions. See the atol / rtol parameters below for the formula and for how to opt into strict float equality.

Parameters:
  • raster (xr.DataArray) – Input raster.

  • mask (xr.DataArray, optional) – Optional input mask. Pixels to include should have mask values of 1 or True, pixels to exclude should have 0 or False. This is the opposite of a NumPy mask.

  • connectivity (int, default=4) – Whether to use 4-connectivity (adjacent along long edge only) or 8-connectivity (adjacent along long edge or diagonal) to determine which pixels are connected. Connectivity of 4 returns valid polygons (by shapely’s definition) provided both x and y are monotonically increasing or decreasing. Connectivity of 8 does not necessarily return valid polygons.

  • transform (ndarray, optional) – Optional affine transform to apply to return polygon coordinates.

  • column_name (str, default="DN") – Name to use for column returned. Only used if return_type is “geopandas” or “spatialpandas”.

  • return_type (str, default="numpy") – Format of returned data. Allowed values are “numpy”, “spatialpandas”, “geopandas”, “awkward” and “geojson”. “numpy” and “geojson” are always available, the others require optional dependencies.

  • simplify_tolerance (float, optional) –

    Simplification tolerance in coordinate units. When set, polygon boundaries are simplified using shared-edge decomposition to preserve topology between adjacent polygons. Default is None (no simplification).

    For "douglas-peucker", this is the maximum perpendicular distance a vertex may deviate from the simplified line.

    For "visvalingam-whyatt", this is the minimum triangle area threshold; vertices forming triangles smaller than this are removed.

  • simplify_method (str, default="douglas-peucker") – Simplification algorithm. Options are "douglas-peucker" (distance-based, good for general use) and "visvalingam-whyatt" (area-based, tends to produce better cartographic results).

  • atol (float, default=1e-8) – Absolute tolerance used when grouping adjacent float pixels. Two adjacent float values a and b are considered the same value (and merged into one polygon) when abs(a - b) <= atol + rtol * abs(a). Has no effect on integer rasters, which always use strict equality. Pass atol=0.0 together with rtol=0.0 to opt into strict equality for float rasters as well (useful when float values encode discrete category labels). The default matches numpy.isclose’s default atol and is exported as xrspatial.polygonize._DEFAULT_ATOL.

  • rtol (float, default=1e-5) – Relative tolerance used together with atol (see atol). Has no effect on integer rasters. The default matches numpy.isclose’s default rtol and is exported as xrspatial.polygonize._DEFAULT_RTOL.

Returns:

  • Polygons and their corresponding values in a format determined by

  • return_type

  • - ``”numpy”`` (default) ((column, polygon_points) where column) – is a list of pixel values and polygon_points is a list of polygons, each polygon a list of Nx2 np.ndarray rings (exterior first, then holes).

  • - ``”awkward”`` ((column, ak.Array) of polygon coordinates.)

  • - ``”geopandas”`` (geopandas.GeoDataFrame with column_name and) – geometry columns.

  • - ``”spatialpandas”`` (spatialpandas.GeoDataFrame.)

  • - ``”geojson”`` (dict representing a GeoJSON FeatureCollection.)

Notes

CuPy and Dask+CuPy arrays are accepted as input. Data is transferred to CPU for processing because boundary tracing is an inherently sequential graph traversal (each step depends on the previous turn direction), preventing GPU parallelism. Output is always CPU-side numpy coordinate arrays regardless of input type.

For Dask+CuPy, each chunk is transferred independently, keeping peak CPU memory proportional to chunk size rather than full raster size.

When return_type="geopandas", the raster’s CRS is propagated to the output GeoDataFrame. The resolution order is raster.attrs['crs'], then raster.attrs['crs_wkt'], then raster.rio.crs (if rioxarray is installed). An unparseable CRS value is dropped rather than raised. The spatialpandas and geojson return types do not carry CRS metadata: spatialpandas has no CRS slot, and GeoJSON (RFC 7946) is WGS84 only.

When transform is not supplied explicitly, the raster’s affine transform is auto-detected in this order: raster.attrs['transform'] (xrspatial.geotiff convention, a rasterio-ordered 6-tuple), then raster.rio.transform() (if rioxarray is installed), then the raster’s own x/y coordinate values (the xarray / xrspatial standard georeferencing convention; used when the coords are 1-D, length >= 2 and evenly spaced). An explicit transform= argument always overrides the auto-detected value. Auto-detection is skipped when the raster carries attrs['_xrspatial_no_georef']=True. This applies to all return types – the geometries themselves are transformed, so the coordinates emitted in the “numpy”, “awkward”, “spatialpandas” and “geojson” outputs are also in CRS coordinate space, not pixel space, when the raster carries a transform.