xrspatial.polygonize.polygonize#
- xrspatial.polygonize.polygonize(raster: DataArray, mask: DataArray | None = None, connectivity: int = 4, transform: ndarray | None = None, column_name: str = 'DN', return_type: str = 'numpy', simplify_tolerance: float | None = None, simplify_method: str = 'douglas-peucker', atol: float = 1e-08, rtol: float = 1e-05) Tuple[List[int | float], List[List[ndarray]]] | Tuple[List[int | float], ak.Array] | gpd.GeoDataFrame | spatialpandas.GeoDataFrame | Dict[str, Any][source]#
Polygonize creates vector polygons for connected regions of pixels in a raster that group together by pixel value. It is a raster to vector converter.
For integer rasters, “same value” means strict equality. For float rasters, adjacent pixels are grouped when their values agree within a small numerical tolerance (controlled by
atolandrtol), so floating-point noise from upstream arithmetic does not split otherwise identical regions. See theatol/rtolparameters below for the formula and for how to opt into strict float equality.- Parameters:
raster (xr.DataArray) – Input raster.
mask (xr.DataArray, optional) – Optional input mask. Pixels to include should have mask values of 1 or True, pixels to exclude should have 0 or False. This is the opposite of a NumPy mask.
connectivity (int, default=4) – Whether to use 4-connectivity (adjacent along long edge only) or 8-connectivity (adjacent along long edge or diagonal) to determine which pixels are connected. Connectivity of 4 returns valid polygons (by shapely’s definition) provided both x and y are monotonically increasing or decreasing. Connectivity of 8 does not necessarily return valid polygons.
transform (ndarray, optional) – Optional affine transform to apply to return polygon coordinates.
column_name (str, default="DN") – Name to use for column returned. Only used if return_type is “geopandas” or “spatialpandas”.
return_type (str, default="numpy") – Format of returned data. Allowed values are “numpy”, “spatialpandas”, “geopandas”, “awkward” and “geojson”. “numpy” and “geojson” are always available, the others require optional dependencies.
simplify_tolerance (float, optional) –
Simplification tolerance in coordinate units. When set, polygon boundaries are simplified using shared-edge decomposition to preserve topology between adjacent polygons. Default is None (no simplification).
For
"douglas-peucker", this is the maximum perpendicular distance a vertex may deviate from the simplified line.For
"visvalingam-whyatt", this is the minimum triangle area threshold; vertices forming triangles smaller than this are removed.simplify_method (str, default="douglas-peucker") – Simplification algorithm. Options are
"douglas-peucker"(distance-based, good for general use) and"visvalingam-whyatt"(area-based, tends to produce better cartographic results).atol (float, default=1e-8) – Absolute tolerance used when grouping adjacent float pixels. Two adjacent float values
aandbare considered the same value (and merged into one polygon) whenabs(a - b) <= atol + rtol * abs(a). Has no effect on integer rasters, which always use strict equality. Passatol=0.0together withrtol=0.0to opt into strict equality for float rasters as well (useful when float values encode discrete category labels). The default matchesnumpy.isclose’s defaultatoland is exported asxrspatial.polygonize._DEFAULT_ATOL.rtol (float, default=1e-5) – Relative tolerance used together with
atol(seeatol). Has no effect on integer rasters. The default matchesnumpy.isclose’s defaultrtoland is exported asxrspatial.polygonize._DEFAULT_RTOL.
- Returns:
Polygons and their corresponding values in a format determined by
return_type- ``”numpy”`` (default) (
(column, polygon_points)wherecolumn) – is a list of pixel values andpolygon_pointsis a list of polygons, each polygon a list ofNx2np.ndarrayrings (exterior first, then holes).- ``”awkward”`` (
(column, ak.Array)of polygon coordinates.)- ``”geopandas”`` (
geopandas.GeoDataFramewithcolumn_nameand) –geometrycolumns.- ``”spatialpandas”`` (
spatialpandas.GeoDataFrame.)- ``”geojson”`` (
dictrepresenting a GeoJSONFeatureCollection.)
Notes
CuPy and Dask+CuPy arrays are accepted as input. Data is transferred to CPU for processing because boundary tracing is an inherently sequential graph traversal (each step depends on the previous turn direction), preventing GPU parallelism. Output is always CPU-side numpy coordinate arrays regardless of input type.
For Dask+CuPy, each chunk is transferred independently, keeping peak CPU memory proportional to chunk size rather than full raster size.
When
return_type="geopandas", the raster’s CRS is propagated to the outputGeoDataFrame. The resolution order israster.attrs['crs'], thenraster.attrs['crs_wkt'], thenraster.rio.crs(if rioxarray is installed). An unparseable CRS value is dropped rather than raised. Thespatialpandasandgeojsonreturn types do not carry CRS metadata: spatialpandas has no CRS slot, and GeoJSON (RFC 7946) is WGS84 only.When
transformis not supplied explicitly, the raster’s affine transform is auto-detected in this order:raster.attrs['transform'](xrspatial.geotiff convention, a rasterio-ordered 6-tuple), thenraster.rio.transform()(if rioxarray is installed), then the raster’s own x/y coordinate values (the xarray / xrspatial standard georeferencing convention; used when the coords are 1-D, length >= 2 and evenly spaced). An explicittransform=argument always overrides the auto-detected value. Auto-detection is skipped when the raster carriesattrs['_xrspatial_no_georef']=True. This applies to all return types – the geometries themselves are transformed, so the coordinates emitted in the “numpy”, “awkward”, “spatialpandas” and “geojson” outputs are also in CRS coordinate space, not pixel space, when the raster carries a transform.