xrspatial.geotiff.open_geotiff#
- xrspatial.geotiff.open_geotiff(source: str | BinaryIO, *, dtype: str | np.dtype | None = None, window: tuple | None = None, bbox: tuple | None = None, overview_level: int | None = None, band: int | None = None, default_name: str | None = None, name: str | None = <object object>, chunks: int | tuple | None = None, gpu: bool = False, max_pixels: int | None = None, max_cloud_bytes: int | None = <object object>, on_gpu_failure: str = <object object>, missing_sources: str = <object object>, allow_rotated: bool = False, allow_unparseable_crs: bool = False, allow_invalid_nodata: bool = False, stable_only: bool = False, allow_experimental_codecs: bool = False, allow_internal_only_jpeg: bool = False, band_nodata: str | None = None, masked: bool = False, mask_nodata: bool = <object object>, unpack: bool = False, mask_and_scale: bool = <object object>, parse_coordinates: bool = True, lock: object | None = None, cache: bool = True) xr.DataArray[source]#
Read a GeoTIFF, COG, or VRT file into an xarray.DataArray.
Release-contract tier (see
docs/source/reference/release_gate_geotiff.rstfor the audited matrix anddocs/source/reference/geotiff_release_contract.rstfor the prose contract once that page lands):[stable] Local-file reads on axis-aligned grids with an EPSG CRS in
attrs['crs']; Tier 1 codecs (none/deflate/lzw/packbits/zstd); windowed reads viawindow=.[advanced] Cloud / fsspec URIs, HTTP range reads,
.vrtmosaics, external.tif.ovrsidecars,allow_rotated=True,allow_unparseable_crs=True,overview_level=selection. These paths work and are tested, but each carries a specific failure mode named on the parameter doc.[experimental]
gpu=True; LERC / JPEG2000 / J2K / LZ4 decode. No cross-backend numerical parity claim. JPEG-in-TIFF on the read side decodes best-effort with no parity claim against libtiff / GDAL / rasterio; the write side is[internal-only](the encoder omits the required JPEGTables tag, so round-trips hold only for files this library itself wrote).Out of scope for this release (allowed to raise): full GDAL VRT parity, warped / reprojection VRTs, rotated/sheared write support.
See
xrspatial.geotiff.SUPPORTED_FEATURESfor the full tier map. Per-parameter tier markers below describe the tier the parameter itself carries; a parameter’s effective tier is bounded by the function-level surface above (e.g.[stable]maskedis still only stable when combined with a[stable]source, codec, and options).The read/masking parameters mostly match rioxarray’s
open_rasterio(masked,default_name,parse_coordinates,lock,cache) so callers can move between the two with minimal edits.maskeddefaults toFalse(no sentinel-to-NaN promotion), matchingopen_rasterio. The scale/offset option is namedunpackhere; rioxarray’smask_and_scaleis kept as a deprecated alias.Automatically dispatches to the best backend: -
gpu=True: GPU-accelerated read via nvCOMP (returns CuPy) -chunks=N: Dask lazy read via windowed chunks -gpu=True, chunks=N: Dask+CuPy for out-of-core GPU pipelines - Default: NumPy eager readVRT files are auto-detected by extension. The supported VRT subset is narrow on purpose. See the “VRT support matrix” section in
docs/source/reference/geotiff.rstand the audited matrix indocs/source/reference/release_gate_geotiff.rstfor the canonical contract. In short:Supported: simple GDAL VRT mosaics over GeoTIFF sources; compatible CRS, transform orientation, pixel size, dtype, and band count across sources; clean windowed reads; lazy / dask reads over the same subset; explicit nodata with mixed-band rejection by default;
missing_sources='raise'as the default.Non-goals (allowed to raise): warped / reprojection VRTs, arbitrary resampling beyond the tested subset, mixed CRS / resolution / dtype / band metadata without an opt-in, nested VRTs, complex source / mask band / alpha band structures, full GDAL VRT parity.
- Parameters:
source (str or binary file-like) – [stable for local file paths; advanced for HTTP/fsspec URIs,
.vrtpaths, and in-memory file-like buffers (the file-like path is restricted to the eager numpy reader – dask, GPU, VRT, and remote-URL paths require a string)] File path, HTTP URL, cloud URI (s3://, gs://, az://), or a binary file-like object (e.g.io.BytesIO) with read+seek.dtype (str, numpy.dtype, or None) – [stable] Cast the result to this dtype after reading. None keeps the file’s native dtype. Float-to-int casts raise ValueError to prevent accidental data loss.
window (tuple or None) – [stable]
(row_start, col_start, row_stop, col_stop)for windowed reading. Mutually exclusive withbbox=.bbox (tuple or None) – [stable]
(x_min, y_min, x_max, y_max)in the file’s CRS. Resolved to a pixelwindow=via a header-only metadata read and clamped to the file’s extent. Requires the source to be georeferenced with an axis-aligned transform; rotated affines requireallow_rotated=Trueto clear the rotation first. Mutually exclusive withwindow=.overview_level (int or None) – [advanced] Overview level (0 = full resolution). Must be a non-negative int or
None; passingboolor any other type raisesTypeError. External.tif.ovrsidecars are also [advanced] and are tested but not load-bearing for release-gate parity.band (int or None) – [stable] Band index (0-based). None returns all bands.
default_name (str or None) – [stable] Name for the DataArray. None derives it from the source file name. Matches rioxarray’s
open_rasterioparameter.name (str or None) – [deprecated] Deprecated alias of
default_name; emits aDeprecationWarning. Passing bothdefault_nameandnameraisesTypeError.chunks (int, tuple, or None) – [stable] Chunk size for Dask lazy reading. Dask reads are gated against the eager reader by the cross-backend parity suite for the Tier 1 codec set.
gpu (bool) – [experimental] Use GPU-accelerated decompression. Requires cupy + numba CUDA plus optional nvCOMP / nvJPEG / nvJPEG2K libraries for codec-specific acceleration. The reader falls back to CPU when those libraries are unavailable unless
on_gpu_failure='strict'is also set. No cross-backend numerical parity claim outside the Tier 1 codec set.max_pixels (int or None) – [stable] Maximum allowed pixel count per materialised buffer. Without
chunks=the cap bounds the full windowed region (width * height * samples); withchunks=the cap bounds each chunk’s decode buffer instead, so a smallmax_pixelsno longer rejects a large lazy raster up front. None uses the default (~1 billion). Raise it to read legitimately large files. Exceeding the cap raisesPixelSafetyLimitError(aValueErrorsubclass).max_cloud_bytes (int or None, optional) – [advanced] fsspec cloud reads can run up cost on large objects; the budget defends against accidental large downloads but the eager path still pulls the full object once the budget allows. Byte ceiling for eager reads from fsspec sources (
s3://,gs://,az://,abfs://,memory://, …). The compressed object size is checked against this budget before any bytes are downloaded; a breach raisesCloudSizeLimitError(aValueErrorsubclass). Default is 256 MiB, overridable via theXRSPATIAL_GEOTIFF_MAX_CLOUD_BYTESenv var. PassNoneto skip the check entirely. The HTTP path already reads only what it needs via range requests and is not subject to this limit. Has no effect on local file or file-like sources. Passing this kwarg withgpu=True,chunks=..., or a.vrtsource raisesValueErrorbecause those backends do not apply the cloud-byte budget.on_gpu_failure ({'auto', 'strict'}, optional) – [experimental] Forwarded to
_read_geotiff_gpuwhengpu=True. Controls whether GPU decode failures fall back to CPU ('auto', default) or re-raise the original exception ('strict'). Passing this kwarg withgpu=FalseraisesValueErrorbecause the policy only applies to the GPU pipeline. See_read_geotiff_gpufor the full description.missing_sources ({'raise', 'warn'}, optional) – [advanced] VRT mosaics can return partial output under
missing_sources='warn'when a backing source is unreadable; theattrs['vrt_holes']entry records which sources were skipped so downstream code can detect the partial mosaic. Forwarded to_read_vrtwhen the source is a.vrtfile. When the caller does not pass this kwarg, the public_read_vrtdefault applies ('raise').'raise'fails immediately on an unreadable backing source.'warn'is the opt-in lenient mode: emitGeoTIFFFallbackWarning, recordattrs['vrt_holes'], and return a partial mosaic. Passing this kwarg with a non-VRT source raisesValueErrorbecause the policy only applies to the VRT pipeline. See_read_vrtfor the full description.band_nodata ({'first', None}, optional) – [advanced] VRT-only. Opt-out for the fail-closed check that rejects VRT sources whose bands declare disagreeing per-band nodata sentinels. When
None(the default), a VRT that mosaics bands with different sentinels raisesMixedBandMetadataError; flattening to one value would let one band’s valid pixels collide with another band’s sentinel. Passband_nodata='first'to keep the legacy behaviour of using band 0’s sentinel for the whole mosaic. Passing this kwarg with a non-VRT source raisesValueErrorbecause the policy only applies to the VRT pipeline.masked (bool, default False) – [stable] If True, replace the nodata sentinel with
NaN; integer rasters get promoted tofloat64first so NaN can be represented. If False (the default), skip the sentinel-to-NaN step and keep the source dtype.attrs['nodata']still carries the raw sentinel either way, so downstream code can mask explicitly. The default matches rioxarray’sopen_rasterio(masked=False); note that earlier xrspatial releases masked by default (mask_nodata=True), so a bareopen_geotiff(path)no longer promotes the sentinel to NaN. Passmasked=Trueanddtype=<integer>together on a source with a maskable sentinel and the read raisesValueError, because the unconditional float64 promotion (issue #2990) makes the integer cast lossy whether or not a sentinel pixel is present.mask_nodata (bool) – [deprecated] Deprecated alias of
masked; emits aDeprecationWarning. Passing bothmaskedandmask_nodataraisesTypeError. Note the default also changed frommask_nodata=Truetomasked=False.unpack (bool, default False) – [experimental] If True, read the source’s GDAL
SCALE/OFFSETmetadata and returndata * scale + offset, masking the nodata sentinel to NaN as well. This unpacks CF-packed data (integers stored with a scale / offset that recover floats) and is the inverse of the writer’spackoption. The applied values are recorded onattrs['scale_factor']/attrs['add_offset']. A source without scale / offset metadata skips the scaling step, but the sentinel-to-NaN mask still runs: a declared nodata sentinel is replaced with NaN and an integer source is promoted tofloat64(matching rioxarray’smask_and_scale). Only a source with neither scale / offset metadata nor a nodata sentinel reads unchanged. A dataset-level scale / offset, or per-band values that agree across bands, applies to the whole array. A source whose per-band scale / offset differ raisesMixedBandMetadataErrorunlessband=selects a single band, in which case that band’s scale / offset is applied. Supported on the CPU eager, dask, GPU (gpu=True), and dask+GPU (gpu=True, chunks=) paths; combining it with a.vrtsource raisesValueError. On the dask+GPU path,unpack=Truereads through the CPU-decode-then-upload route rather than the direct disk->GPU GDS fast path (the GDS path has no scale step), so a local tiled COG that would otherwise stream straight to the device decodes on CPU first. Round-trip caveat: the source’sSCALE/OFFSETtags stay onattrs['gdal_metadata']/attrs['gdal_metadata_xml']after the read, so writing anunpack=Trueresult back out withto_geotiffre-embeds them, and reading that file again withunpack=Trueapplies the scale a second time. Drop those tags (andattrs['scale_factor']/attrs['add_offset']) before writing if you need a clean round-trip.mask_and_scale (bool) – [deprecated] Deprecated alias of
unpack; emits aDeprecationWarning. Named after rioxarray’sopen_rasterio. Passing bothunpackandmask_and_scaleraisesTypeError.parse_coordinates (bool, default True) – [stable] If True (the default), build
x/ycoordinate arrays from the transform. If False, skip them and return a DataArray with only dimensions (matching rioxarray’sopen_rasterio);attrs['transform']andattrs['crs']still carry the georeferencing, and thebandcoord is kept. Supported on the CPU eager and dask paths; combiningparse_coordinates=Falsewithgpu=Trueor a.vrtsource raisesValueError.lock (object or None) – [advanced] Accepted for
open_rasteriosignature compatibility but has no effect: xrspatial’s reader re-opens the source per window, so there is no shared GDAL handle to lock. Passing a non-default value emits aGeoTIFFFallbackWarning.cache (bool) – [advanced] Accepted for
open_rasteriosignature compatibility but has no effect: xrspatial has no caching backend to toggle. Passing a non-default value emits aGeoTIFFFallbackWarning.allow_rotated (bool, default False) – [advanced] Read-only opt-in.
to_geotiffdoes not currently emitrotated_affine; it rejects DataArrays that carry the attr (ValueErrornaming the attr) unless the caller passesdrop_rotation=Trueto accept the loss explicitly. Read-side opt-in for rotated / shearedModelTransformationTagfiles. By default the reader raisesRotatedTransformError(aGeoTIFFAmbiguousMetadataError/ValueErrorsubclass) because the rest of xrspatial assumes an axis-aligned grid.allow_rotated=Truereads the pixel grid without the geospatial assumption: the result has integer pixel coords onx/yand bothattrs['crs']andattrs['crs_wkt']are dropped. The CRS attrs are dropped together with the transform because keeping them while the axis-aligned transform is gone misleads downstream code that gates on"crs" in da.attrsto mean the array is spatially usable. The rotated 6-tuple itself is surfaced onattrs['rotated_affine']as(a, b, c, d, e, f)(rasterioAffineordering) so consumers that know how to handle rotated rasters can recover the mapping. The contract is read-only – writes must either reproject onto an axis-aligned grid first, or passdrop_rotation=Truetoto_geotiff/_write_geotiff_gputo accept the loss; theModelTransformationTagemit path is tracked separately.allow_unparseable_crs (bool, default False) – [advanced] Read-side opt-in for CRS strings that pyproj cannot resolve and that do not parse as WKT. When
False(the default), an unrecognised CRS payload raisesUnparseableCRSErrorinstead of landing inattrs['crs_wkt']verbatim. Set toTrueto keep the permissive behaviour where the citation field passes through unchanged. Matches the same kwarg onto_geotiff/_write_geotiff_gpuso a value the reader accepted can survive a round-trip.allow_invalid_nodata (bool, default False) – [advanced] Read-side opt-in for integer-dtype sources whose
GDAL_NODATAtag is non-finite ("NaN","Inf","-Inf") or fractional (e.g."3.5"on auint16file). The legacy reader parsed the value intoattrs['nodata']and silently skipped the masking step, so callers had no way to tell a silently-ignored sentinel from a missing one. WhenFalse(the default), the read raisesInvalidIntegerNodataError. Set toTrueto keep the pre-rejection no-op behaviour for files known to carry such sentinels (e.g. external tooling that writes"nan"on integer outputs).stable_only (bool, default False) – [advanced] Read-side opt-in that restricts the read to the stable-tier local-file path. When
True, advanced-tier sources are rejected: a.vrtsource raisesVRTStableSourcesOnlyErrorbecausereader.vrtand the VRT child-source pipeline sit at theadvanced/experimentaltiers inxrspatial.geotiff.SUPPORTED_FEATURES, and HTTP / fsspec sources (http(s)://,s3://, etc.) are rejected too becausereader.httpandreader.fsspecare alsoadvanced. Only a local-file source riding the stablereader.local_filepath and the per-source codec gate is accepted. The rejection names the offending source and theallow_experimental_codecsopt-in so the caller can unlock the broader tier set explicitly when needed. Seedocs/source/reference/release_gate_geotiff.rst. The VRT rejection is enforced today; the HTTP / fsspec rejection is the documented contract being rolled out and may not yet fire on every read path (tracked in issue #2820).allow_experimental_codecs (bool, default False) – Read-side opt-in for sources compressed with the Tier 3 experimental codecs (
lerc,jpeg2000/j2k,lz4). DefaultFalserejects the read withValueErrornaming the flag; cross-backend numerical parity is not claimed and reader support across GDAL versions is uneven. Matches the same kwarg on the writers so a round-trip through a Tier 3 codec stays opt-in on both sides. See SUPPORTED_FEATURES tier'experimental'.allow_internal_only_jpeg (bool, default False) – Read-side opt-in for JPEG-in-TIFF sources. The encoder writes self-contained JFIF tiles without the TIFF JPEGTables tag (347), so the read path is not interoperable with libtiff / GDAL / rasterio.
allow_experimental_codecs=Truedoes NOT cover this codec; the dedicated flag is its only gate. See SUPPORTED_FEATURES tier'internal_only'forcodec.jpeg.
- Returns:
NumPy, Dask, CuPy, or Dask+CuPy backed depending on options.
- Return type:
xr.DataArray
Notes
The CRS is stored as an int EPSG code in
attrs['crs']whenever the file’s GeoKeys carry a recognized EPSG. Files whose CRS can only be expressed as WKT keep the WKT inattrs['crs_wkt']and leaveattrs['crs']unset.to_geotiffaccepts either an int EPSG or a WKT string inattrs['crs']for backward compatibility.The file’s GeoTransform is also surfaced as
attrs['transform'], a rasterio-style 6-tuple(pixel_width, 0, origin_x, 0, pixel_height, origin_y).to_geotiffuses this attr verbatim when present, falling back to recomputing the transform from the y/x coord arrays only when it is missing. The attr is what makes write -> read -> write -> read round-trips bit-stable for rasters with fractional pixel sizes or origins.With
masked=True, integer rasters with a nodata sentinel are promoted tofloat64with NaN replacing the sentinel so downstream NaN-aware code works uniformly. The defaultmasked=Falsekeeps the source dtype and leaves the raw sentinel in the data;attrs['nodata']still carries it either way. Withmasked=True, passingdtype=<integer>as well is not enough to keep an integer dtype: the sentinel-to-NaN promotion runs first and the subsequent integer cast then raisesValueError(float-to-int is lossy in a way users rarely intend). The promotion runs whenever the sentinel is maskable (finite, integer, in-range), whether or not any pixel matches it, so the eager and dask paths return the same float64 dtype for the same input (issue #2990). A sentinel that cannot match (out-of-range, non-finite, or fractional) leaves the source dtype, sodtype=<integer>works in that case.Examples
Safe VRT usage. Write a
.vrtmosaic withto_geotiffand read it back with the fail-closed defaults:>>> from xrspatial.geotiff import open_geotiff, to_geotiff >>> to_geotiff(data, 'mosaic.vrt') >>> da = open_geotiff('mosaic.vrt')
Intentionally raises. A VRT whose source tiles disagree on their per-band nodata sentinels is rejected by the default
band_nodata=None:>>> from xrspatial.geotiff import MixedBandMetadataError >>> try: ... open_geotiff('mixed_nodata.vrt') ... except MixedBandMetadataError: ... pass # pass band_nodata='first' to opt back into the ... # legacy flatten-to-band-0 semantics, or fix the ... # source tiles.