Caveats & Assumptions#

xarray-spatial makes several assumptions about input data that can produce wrong or confusing results if they aren’t met. This page collects them in one place. Each section maps to a Sphinx admonition level so you can gauge severity at a glance:

Danger

marks assumptions that cause silently wrong output.

Warning

marks gotchas that change the meaning of results.

Caution

marks performance or memory pitfalls.

Tip

gives practical workarounds.

Note

provides context that clarifies non-obvious behaviour.

Geodesic terrain functions assume WGS84 #

Danger

slope(), aspect(), curvature(), and hillshade() with method='geodesic' use the WGS84 ellipsoid (a = 6 378 137 m, b = 6 356 752 m). There is no parameter to select a different ellipsoid.

If your elevation data references a different ellipsoid (e.g. Clarke 1866 or Bessel 1841), the geodesic path will still run, but the ECEF projection and curvature correction will be slightly off. For most modern data this doesn’t matter because WGS84 and GRS80 are functionally identical, but older survey-quality datasets on local datums can differ by tens of metres in the geoid.

Projected coordinates + geodesic = wrong results #

Danger

Passing a raster whose coordinates are in metres (UTM, state-plane, etc.) to method='geodesic' will produce garbage. The geodesic path expects latitude/longitude in degrees.

The library does validate that coordinate values fall within geographic ranges (latitude in [-90, 90], longitude in [-180, 360]) and will raise a ValueError if they don’t. But this check is not foolproof: coordinates in small projected grids that happen to look like valid lat/lon values will slip through.

Tip

Use method='planar' (the default) for projected CRS data. Use method='geodesic' only when your coordinates are in degrees on a geographic CRS.

All output is float32 #

Warning

Every analytical function casts its output to float32 regardless of input dtype. If you pass float64 elevation data, the result is float32.

Float32 gives about 7 significant digits, which is more than enough for slope, NDVI, and similar metrics. But if you’re chaining many operations or working with data that needs sub-centimetre precision (e.g. LiDAR heights in a small range), the truncation can matter.

See Data Type Handling for the full rationale.

Tip

If you need float64 output, cast after the function returns:

result = slope(agg).astype('float64')

NaN is the nodata sentinel #

Warning

xarray-spatial uses NaN as its nodata value everywhere. There is no parameter to choose a different sentinel (e.g. -9999).

Functions generally propagate NaN: if any cell in a 3x3 kernel is NaN, the output cell is NaN. The exact behaviour varies by function:

Slope, aspect, hillshade – any NaN neighbour produces a NaN output cell.
Focal mean/std – NaN neighbours are excluded from the average; the cell itself is still computed.
Zonal stats – NaN values are excluded from the aggregation. An all-NaN zone returns NaN, not zero.
Hydrology (flow direction) – NaN cells are treated as impassable barriers. Flow cannot cross them.
Pathfinding (A*) – NaN cells are impassable, same as barriers.

Note

Edge cells (within the kernel radius of the raster boundary) default to NaN when boundary='nan'. Use boundary='nearest' or boundary='reflect' if you need values at the edges.

Proximity distances depend on coordinate units #

Warning

proximity() with distance_metric='EUCLIDEAN' (the default) computes distances in whatever units the DataArray’s coordinates use. If coordinates are default integer indices (0, 1, 2, …), the result is in pixel counts. If coordinates are in metres (UTM), the result is in metres. If coordinates are in degrees, the result is in degrees.

Switch to distance_metric='GREAT_CIRCLE' for lat/lon data to get distances in metres (the radius used is 6 378 137 m).

Tip

from xrspatial.proximity import proximity

# EUCLIDEAN with default coords = pixel-count distances
dist_px = proximity(raster)

# GREAT_CIRCLE with lat/lon coords = distances in metres
dist_m = proximity(raster, distance_metric='GREAT_CIRCLE')

Great-circle distances assume sphere radius = semi-major axis #

Note

Great-circle distances in proximity() and surface_distance() model the Earth as a sphere with radius = WGS84 semi-major axis (6 378 137 m), not the mean radius (~6 371 km).

The difference is about 0.1 %. This is negligible for most use cases but worth knowing if you’re comparing against a reference that uses the mean radius or the IUGG value.

Dask chunk sizes and `map_overlap`#

Caution

Focal, slope, aspect, and other kernel-based operations use dask.array.map_overlap. If any chunk dimension is smaller than the kernel depth (typically 1 cell for a 3x3 kernel), the overlap will fail or produce wrong results.

Caution

proximity() with Dask expands each chunk by max_distance cells. If max_distance is infinite (the default), the entire array is loaded into one chunk. Set a finite max_distance to keep memory bounded.

Tip

A good rule of thumb for focal operations: keep chunks at least 64 cells on each side. For proximity, set max_distance to the largest distance you actually care about.

import dask.array as da
import xarray as xr

# Rechunk to safe sizes before running slope
agg = xr.DataArray(
    da.from_array(big_array, chunks=(512, 512))
)

GPU memory is not managed automatically #

Caution

CuPy-backed operations allocate the full output array on the GPU. There is no automatic tiling or fallback to CPU if the array exceeds VRAM.

If you hit cuda.cudadrv.driver.CudaAPIError or cupy.cuda.memory .OutOfMemoryError, the array is too large for your GPU. Use Dask + CuPy to process in chunks, or fall back to NumPy.

Tip

Complex geodesic kernels (slope, aspect with method='geodesic') use many float64 registers, which limits thread-block occupancy. xarray-spatial already reduces thread blocks to 16x16 for these kernels, but very old GPUs (compute capability < 6.0) may still hit register spill.

Dimension ordering #

Note

xarray-spatial assumes the last two dimensions of a DataArray are (y, x) in that order. It does not inspect attrs['crs'] or CF conventions to verify this.

If your data has (x, y) ordering (rare, but it happens with some netCDF conventions), transpose before passing to any function:

agg = agg.transpose('y', 'x')

Classification and NaN/Inf #

Warning

equal_interval(), quantile(), natural_breaks(), and other classification functions set NaN and infinite input values to NaN in the output. They do not raise an error.

This means a raster with a few stray inf values (e.g. from a division by zero upstream) will silently produce NaN bins for those cells.

Tip

Clean infinities before classifying:

import numpy as np
agg = agg.where(np.isfinite(agg))