xrspatial.geotiff.open_geotiff#

xrspatial.geotiff.open_geotiff(source: str | BinaryIO, *, dtype: str | np.dtype | None = None, window: tuple | None = None, bbox: tuple | None = None, overview_level: int | None = None, band: int | None = None, default_name: str | None = None, name: str | None = <object object>, chunks: int | tuple | None = None, gpu: bool = False, max_pixels: int | None = None, max_cloud_bytes: int | None = <object object>, on_gpu_failure: str = <object object>, missing_sources: str = <object object>, allow_rotated: bool = False, allow_unparseable_crs: bool = False, allow_invalid_nodata: bool = False, stable_only: bool = False, allow_experimental_codecs: bool = False, allow_internal_only_jpeg: bool = False, band_nodata: str | None = None, masked: bool = False, mask_nodata: bool = <object object>, unpack: bool = False, mask_and_scale: bool = <object object>, parse_coordinates: bool = True, lock: object | None = None, cache: bool = True) xr.DataArray[source]#

Read a GeoTIFF, COG, or VRT file into an xarray.DataArray.

Release-contract tier (see docs/source/reference/release_gate_geotiff.rst for the audited matrix and docs/source/reference/geotiff_release_contract.rst for the prose contract once that page lands):

  • [stable] Local-file reads on axis-aligned grids with an EPSG CRS in attrs['crs']; Tier 1 codecs (none / deflate / lzw / packbits / zstd); windowed reads via window=.

  • [advanced] Cloud / fsspec URIs, HTTP range reads, .vrt mosaics, external .tif.ovr sidecars, allow_rotated=True, allow_unparseable_crs=True, overview_level= selection. These paths work and are tested, but each carries a specific failure mode named on the parameter doc.

  • [experimental] gpu=True; LERC / JPEG2000 / J2K / LZ4 decode. No cross-backend numerical parity claim. JPEG-in-TIFF on the read side decodes best-effort with no parity claim against libtiff / GDAL / rasterio; the write side is [internal-only] (the encoder omits the required JPEGTables tag, so round-trips hold only for files this library itself wrote).

  • Out of scope for this release (allowed to raise): full GDAL VRT parity, warped / reprojection VRTs, rotated/sheared write support.

See xrspatial.geotiff.SUPPORTED_FEATURES for the full tier map. Per-parameter tier markers below describe the tier the parameter itself carries; a parameter’s effective tier is bounded by the function-level surface above (e.g. [stable] masked is still only stable when combined with a [stable] source, codec, and options).

The read/masking parameters mostly match rioxarray’s open_rasterio (masked, default_name, parse_coordinates, lock, cache) so callers can move between the two with minimal edits. masked defaults to False (no sentinel-to-NaN promotion), matching open_rasterio. The scale/offset option is named unpack here; rioxarray’s mask_and_scale is kept as a deprecated alias.

Automatically dispatches to the best backend: - gpu=True: GPU-accelerated read via nvCOMP (returns CuPy) - chunks=N: Dask lazy read via windowed chunks - gpu=True, chunks=N: Dask+CuPy for out-of-core GPU pipelines - Default: NumPy eager read

VRT files are auto-detected by extension. The supported VRT subset is narrow on purpose. See the “VRT support matrix” section in docs/source/reference/geotiff.rst and the audited matrix in docs/source/reference/release_gate_geotiff.rst for the canonical contract. In short:

  • Supported: simple GDAL VRT mosaics over GeoTIFF sources; compatible CRS, transform orientation, pixel size, dtype, and band count across sources; clean windowed reads; lazy / dask reads over the same subset; explicit nodata with mixed-band rejection by default; missing_sources='raise' as the default.

  • Non-goals (allowed to raise): warped / reprojection VRTs, arbitrary resampling beyond the tested subset, mixed CRS / resolution / dtype / band metadata without an opt-in, nested VRTs, complex source / mask band / alpha band structures, full GDAL VRT parity.

Parameters:
  • source (str or binary file-like) – [stable for local file paths; advanced for HTTP/fsspec URIs, .vrt paths, and in-memory file-like buffers (the file-like path is restricted to the eager numpy reader – dask, GPU, VRT, and remote-URL paths require a string)] File path, HTTP URL, cloud URI (s3://, gs://, az://), or a binary file-like object (e.g. io.BytesIO) with read+seek.

  • dtype (str, numpy.dtype, or None) – [stable] Cast the result to this dtype after reading. None keeps the file’s native dtype. Float-to-int casts raise ValueError to prevent accidental data loss.

  • window (tuple or None) – [stable] (row_start, col_start, row_stop, col_stop) for windowed reading. Mutually exclusive with bbox=.

  • bbox (tuple or None) – [stable] (x_min, y_min, x_max, y_max) in the file’s CRS. Resolved to a pixel window= via a header-only metadata read and clamped to the file’s extent. Requires the source to be georeferenced with an axis-aligned transform; rotated affines require allow_rotated=True to clear the rotation first. Mutually exclusive with window=.

  • overview_level (int or None) – [advanced] Overview level (0 = full resolution). Must be a non-negative int or None; passing bool or any other type raises TypeError. External .tif.ovr sidecars are also [advanced] and are tested but not load-bearing for release-gate parity.

  • band (int or None) – [stable] Band index (0-based). None returns all bands.

  • default_name (str or None) – [stable] Name for the DataArray. None derives it from the source file name. Matches rioxarray’s open_rasterio parameter.

  • name (str or None) – [deprecated] Deprecated alias of default_name; emits a DeprecationWarning. Passing both default_name and name raises TypeError.

  • chunks (int, tuple, or None) – [stable] Chunk size for Dask lazy reading. Dask reads are gated against the eager reader by the cross-backend parity suite for the Tier 1 codec set.

  • gpu (bool) – [experimental] Use GPU-accelerated decompression. Requires cupy + numba CUDA plus optional nvCOMP / nvJPEG / nvJPEG2K libraries for codec-specific acceleration. The reader falls back to CPU when those libraries are unavailable unless on_gpu_failure='strict' is also set. No cross-backend numerical parity claim outside the Tier 1 codec set.

  • max_pixels (int or None) – [stable] Maximum allowed pixel count per materialised buffer. Without chunks= the cap bounds the full windowed region (width * height * samples); with chunks= the cap bounds each chunk’s decode buffer instead, so a small max_pixels no longer rejects a large lazy raster up front. None uses the default (~1 billion). Raise it to read legitimately large files. Exceeding the cap raises PixelSafetyLimitError (a ValueError subclass).

  • max_cloud_bytes (int or None, optional) – [advanced] fsspec cloud reads can run up cost on large objects; the budget defends against accidental large downloads but the eager path still pulls the full object once the budget allows. Byte ceiling for eager reads from fsspec sources (s3://, gs://, az://, abfs://, memory://, …). The compressed object size is checked against this budget before any bytes are downloaded; a breach raises CloudSizeLimitError (a ValueError subclass). Default is 256 MiB, overridable via the XRSPATIAL_GEOTIFF_MAX_CLOUD_BYTES env var. Pass None to skip the check entirely. The HTTP path already reads only what it needs via range requests and is not subject to this limit. Has no effect on local file or file-like sources. Passing this kwarg with gpu=True, chunks=..., or a .vrt source raises ValueError because those backends do not apply the cloud-byte budget.

  • on_gpu_failure ({'auto', 'strict'}, optional) – [experimental] Forwarded to _read_geotiff_gpu when gpu=True. Controls whether GPU decode failures fall back to CPU ('auto', default) or re-raise the original exception ('strict'). Passing this kwarg with gpu=False raises ValueError because the policy only applies to the GPU pipeline. See _read_geotiff_gpu for the full description.

  • missing_sources ({'raise', 'warn'}, optional) – [advanced] VRT mosaics can return partial output under missing_sources='warn' when a backing source is unreadable; the attrs['vrt_holes'] entry records which sources were skipped so downstream code can detect the partial mosaic. Forwarded to _read_vrt when the source is a .vrt file. When the caller does not pass this kwarg, the public _read_vrt default applies ('raise'). 'raise' fails immediately on an unreadable backing source. 'warn' is the opt-in lenient mode: emit GeoTIFFFallbackWarning, record attrs['vrt_holes'], and return a partial mosaic. Passing this kwarg with a non-VRT source raises ValueError because the policy only applies to the VRT pipeline. See _read_vrt for the full description.

  • band_nodata ({'first', None}, optional) – [advanced] VRT-only. Opt-out for the fail-closed check that rejects VRT sources whose bands declare disagreeing per-band nodata sentinels. When None (the default), a VRT that mosaics bands with different sentinels raises MixedBandMetadataError; flattening to one value would let one band’s valid pixels collide with another band’s sentinel. Pass band_nodata='first' to keep the legacy behaviour of using band 0’s sentinel for the whole mosaic. Passing this kwarg with a non-VRT source raises ValueError because the policy only applies to the VRT pipeline.

  • masked (bool, default False) – [stable] If True, replace the nodata sentinel with NaN; integer rasters get promoted to float64 first so NaN can be represented. If False (the default), skip the sentinel-to-NaN step and keep the source dtype. attrs['nodata'] still carries the raw sentinel either way, so downstream code can mask explicitly. The default matches rioxarray’s open_rasterio (masked=False); note that earlier xrspatial releases masked by default (mask_nodata=True), so a bare open_geotiff(path) no longer promotes the sentinel to NaN. Pass masked=True and dtype=<integer> together on a source with a maskable sentinel and the read raises ValueError, because the unconditional float64 promotion (issue #2990) makes the integer cast lossy whether or not a sentinel pixel is present.

  • mask_nodata (bool) – [deprecated] Deprecated alias of masked; emits a DeprecationWarning. Passing both masked and mask_nodata raises TypeError. Note the default also changed from mask_nodata=True to masked=False.

  • unpack (bool, default False) – [experimental] If True, read the source’s GDAL SCALE / OFFSET metadata and return data * scale + offset, masking the nodata sentinel to NaN as well. This unpacks CF-packed data (integers stored with a scale / offset that recover floats) and is the inverse of the writer’s pack option. The applied values are recorded on attrs['scale_factor'] / attrs['add_offset']. A source without scale / offset metadata skips the scaling step, but the sentinel-to-NaN mask still runs: a declared nodata sentinel is replaced with NaN and an integer source is promoted to float64 (matching rioxarray’s mask_and_scale). Only a source with neither scale / offset metadata nor a nodata sentinel reads unchanged. A dataset-level scale / offset, or per-band values that agree across bands, applies to the whole array. A source whose per-band scale / offset differ raises MixedBandMetadataError unless band= selects a single band, in which case that band’s scale / offset is applied. Supported on the CPU eager, dask, GPU (gpu=True), and dask+GPU (gpu=True, chunks=) paths; combining it with a .vrt source raises ValueError. On the dask+GPU path, unpack=True reads through the CPU-decode-then-upload route rather than the direct disk->GPU GDS fast path (the GDS path has no scale step), so a local tiled COG that would otherwise stream straight to the device decodes on CPU first. Round-trip caveat: the source’s SCALE / OFFSET tags stay on attrs['gdal_metadata'] / attrs['gdal_metadata_xml'] after the read, so writing an unpack=True result back out with to_geotiff re-embeds them, and reading that file again with unpack=True applies the scale a second time. Drop those tags (and attrs['scale_factor'] / attrs['add_offset']) before writing if you need a clean round-trip.

  • mask_and_scale (bool) – [deprecated] Deprecated alias of unpack; emits a DeprecationWarning. Named after rioxarray’s open_rasterio. Passing both unpack and mask_and_scale raises TypeError.

  • parse_coordinates (bool, default True) – [stable] If True (the default), build x / y coordinate arrays from the transform. If False, skip them and return a DataArray with only dimensions (matching rioxarray’s open_rasterio); attrs['transform'] and attrs['crs'] still carry the georeferencing, and the band coord is kept. Supported on the CPU eager and dask paths; combining parse_coordinates=False with gpu=True or a .vrt source raises ValueError.

  • lock (object or None) – [advanced] Accepted for open_rasterio signature compatibility but has no effect: xrspatial’s reader re-opens the source per window, so there is no shared GDAL handle to lock. Passing a non-default value emits a GeoTIFFFallbackWarning.

  • cache (bool) – [advanced] Accepted for open_rasterio signature compatibility but has no effect: xrspatial has no caching backend to toggle. Passing a non-default value emits a GeoTIFFFallbackWarning.

  • allow_rotated (bool, default False) – [advanced] Read-only opt-in. to_geotiff does not currently emit rotated_affine; it rejects DataArrays that carry the attr (ValueError naming the attr) unless the caller passes drop_rotation=True to accept the loss explicitly. Read-side opt-in for rotated / sheared ModelTransformationTag files. By default the reader raises RotatedTransformError (a GeoTIFFAmbiguousMetadataError / ValueError subclass) because the rest of xrspatial assumes an axis-aligned grid. allow_rotated=True reads the pixel grid without the geospatial assumption: the result has integer pixel coords on x / y and both attrs['crs'] and attrs['crs_wkt'] are dropped. The CRS attrs are dropped together with the transform because keeping them while the axis-aligned transform is gone misleads downstream code that gates on "crs" in da.attrs to mean the array is spatially usable. The rotated 6-tuple itself is surfaced on attrs['rotated_affine'] as (a, b, c, d, e, f) (rasterio Affine ordering) so consumers that know how to handle rotated rasters can recover the mapping. The contract is read-only – writes must either reproject onto an axis-aligned grid first, or pass drop_rotation=True to to_geotiff / _write_geotiff_gpu to accept the loss; the ModelTransformationTag emit path is tracked separately.

  • allow_unparseable_crs (bool, default False) – [advanced] Read-side opt-in for CRS strings that pyproj cannot resolve and that do not parse as WKT. When False (the default), an unrecognised CRS payload raises UnparseableCRSError instead of landing in attrs['crs_wkt'] verbatim. Set to True to keep the permissive behaviour where the citation field passes through unchanged. Matches the same kwarg on to_geotiff / _write_geotiff_gpu so a value the reader accepted can survive a round-trip.

  • allow_invalid_nodata (bool, default False) – [advanced] Read-side opt-in for integer-dtype sources whose GDAL_NODATA tag is non-finite ("NaN", "Inf", "-Inf") or fractional (e.g. "3.5" on a uint16 file). The legacy reader parsed the value into attrs['nodata'] and silently skipped the masking step, so callers had no way to tell a silently-ignored sentinel from a missing one. When False (the default), the read raises InvalidIntegerNodataError. Set to True to keep the pre-rejection no-op behaviour for files known to carry such sentinels (e.g. external tooling that writes "nan" on integer outputs).

  • stable_only (bool, default False) – [advanced] Read-side opt-in that restricts the read to the stable-tier local-file path. When True, advanced-tier sources are rejected: a .vrt source raises VRTStableSourcesOnlyError because reader.vrt and the VRT child-source pipeline sit at the advanced / experimental tiers in xrspatial.geotiff.SUPPORTED_FEATURES, and HTTP / fsspec sources (http(s)://, s3://, etc.) are rejected too because reader.http and reader.fsspec are also advanced. Only a local-file source riding the stable reader.local_file path and the per-source codec gate is accepted. The rejection names the offending source and the allow_experimental_codecs opt-in so the caller can unlock the broader tier set explicitly when needed. See docs/source/reference/release_gate_geotiff.rst. The VRT rejection is enforced today; the HTTP / fsspec rejection is the documented contract being rolled out and may not yet fire on every read path (tracked in issue #2820).

  • allow_experimental_codecs (bool, default False) – Read-side opt-in for sources compressed with the Tier 3 experimental codecs (lerc, jpeg2000 / j2k, lz4). Default False rejects the read with ValueError naming the flag; cross-backend numerical parity is not claimed and reader support across GDAL versions is uneven. Matches the same kwarg on the writers so a round-trip through a Tier 3 codec stays opt-in on both sides. See SUPPORTED_FEATURES tier 'experimental'.

  • allow_internal_only_jpeg (bool, default False) – Read-side opt-in for JPEG-in-TIFF sources. The encoder writes self-contained JFIF tiles without the TIFF JPEGTables tag (347), so the read path is not interoperable with libtiff / GDAL / rasterio. allow_experimental_codecs=True does NOT cover this codec; the dedicated flag is its only gate. See SUPPORTED_FEATURES tier 'internal_only' for codec.jpeg.

Returns:

NumPy, Dask, CuPy, or Dask+CuPy backed depending on options.

Return type:

xr.DataArray

Notes

The CRS is stored as an int EPSG code in attrs['crs'] whenever the file’s GeoKeys carry a recognized EPSG. Files whose CRS can only be expressed as WKT keep the WKT in attrs['crs_wkt'] and leave attrs['crs'] unset. to_geotiff accepts either an int EPSG or a WKT string in attrs['crs'] for backward compatibility.

The file’s GeoTransform is also surfaced as attrs['transform'], a rasterio-style 6-tuple (pixel_width, 0, origin_x, 0, pixel_height, origin_y). to_geotiff uses this attr verbatim when present, falling back to recomputing the transform from the y/x coord arrays only when it is missing. The attr is what makes write -> read -> write -> read round-trips bit-stable for rasters with fractional pixel sizes or origins.

With masked=True, integer rasters with a nodata sentinel are promoted to float64 with NaN replacing the sentinel so downstream NaN-aware code works uniformly. The default masked=False keeps the source dtype and leaves the raw sentinel in the data; attrs['nodata'] still carries it either way. With masked=True, passing dtype=<integer> as well is not enough to keep an integer dtype: the sentinel-to-NaN promotion runs first and the subsequent integer cast then raises ValueError (float-to-int is lossy in a way users rarely intend). The promotion runs whenever the sentinel is maskable (finite, integer, in-range), whether or not any pixel matches it, so the eager and dask paths return the same float64 dtype for the same input (issue #2990). A sentinel that cannot match (out-of-range, non-finite, or fractional) leaves the source dtype, so dtype=<integer> works in that case.

Examples

Safe VRT usage. Write a .vrt mosaic with to_geotiff and read it back with the fail-closed defaults:

>>> from xrspatial.geotiff import open_geotiff, to_geotiff
>>> to_geotiff(data, 'mosaic.vrt')
>>> da = open_geotiff('mosaic.vrt')

Intentionally raises. A VRT whose source tiles disagree on their per-band nodata sentinels is rejected by the default band_nodata=None:

>>> from xrspatial.geotiff import MixedBandMetadataError
>>> try:
...     open_geotiff('mixed_nodata.vrt')
... except MixedBandMetadataError:
...     pass  # pass band_nodata='first' to opt back into the
...           # legacy flatten-to-band-0 semantics, or fix the
...           # source tiles.