Why xarray-spatial?#

What is a raster?#

Rasters are regularly gridded datasets like GeoTIFFs, JPGs, and PNGs.

In the GIS world, rasters are used for representing continuous phenomena (e.g. elevation, rainfall, distance), either directly as numerical values, or as RGB images created for humans to view. Rasters typically have two spatial dimensions, but may have any number of other dimensions (time, type of measurement, etc.)

What xarray-spatial is#

xarray-spatial is a raster analysis library built on xarray. It has 150+ functions for surface analysis, hydrology, fire behavior, flood modeling, multispectral indices, proximity, classification, pathfinding, and interpolation. Every function takes an xr.DataArray and returns an xr.DataArray, so operations chain without conversions.

Functions dispatch automatically across four backends based on the type of the input array: NumPy, Dask, CuPy, and Dask+CuPy. The same slope(terrain) call works on an in-memory array, an out-of-core chunked array, or a GPU array. Not every function reaches the same maturity on every backend; the feature matrix in the README and the Stability policy and LTS commitment page track the per-function, per-backend tiers.

No GDAL, no GEOS#

Within the Python ecosystem, many geospatial libraries interface with the GDAL C++ library for raster and vector input, output, and analysis (e.g. rasterio, rasterstats, geopandas). GDAL is robust, performant, and has decades of great work behind it. For years, off-loading expensive computations to the C/C++ level in this way has been a key performance strategy for Python libraries (obviously…Python itself is implemented in C!).

However, wrapping GDAL has a few drawbacks for Python developers and data scientists:

  • GDAL can be a pain to build / install.

  • GDAL is hard for Python developers/analysts to extend, because it requires understanding multiple languages.

  • GDAL’s data structures are defined at the C/C++ level, which constrains how they can be accessed from Python.

xarray-spatial does not depend on GDAL or GEOS at all. The compute functions are implemented with Numba and Dask, and raster I/O, reprojection, compression codecs, and coordinate handling are pure Python and Numba (see GeoTIFF / COG). All of the source is readable Python without any “black box” barriers that obscure what is going on and prevent full optimization. Projects can use xarray-spatial where it fits while still using GDAL-based tools for other tasks.

How it fits the ecosystem#

Any 2D xr.DataArray works as input, including ones produced by other libraries such as rioxarray; nothing about the inputs is specific to xarray-spatial’s own reader. The project grew out of Datashader, which provides fast rasterization of vector data (points, lines, polygons, meshes) for use with xarray-spatial.