xrspatial.geotiff.to_geotiff#

xrspatial.geotiff.to_geotiff(data: xr.DataArray | np.ndarray, path: str | BinaryIO, *, crs: int | str | None = None, nodata: float | int | None = None, compression: str = 'zstd', compression_level: int | None = None, tiled: bool = True, tile_size: int = 256, predictor: bool | int = False, cog: bool = False, overview_levels: list[int] | None = None, overview_resampling: str = 'mean', bigtiff: bool | None = None, gpu: bool | None = None, streaming_buffer_bytes: int = 268435456, max_z_error: float = 0.0, photometric: str | int = 'auto', allow_internal_only_jpeg: bool = False, allow_experimental_codecs: bool = False, allow_unparseable_crs: bool = False, drop_rotation: bool = False, pack: bool = False, color_ramp: str | bool | None = None, color_ramp_range: tuple[float, float] | None = None) str | BinaryIO[source]#

Write data as a GeoTIFF or Cloud Optimized GeoTIFF.

Release-contract tier (see docs/source/reference/release_gate_geotiff.rst and docs/source/reference/geotiff_release_contract.rst):

  • [stable] Local-file output on an axis-aligned grid with compression in {'none', 'deflate', 'lzw', 'packbits', 'zstd'}; CRS / transform / nodata attrs round-trip; bigtiff auto-promotion; cog=True (the IFD-first tiled COG layout with a stable codec, covered by SUPPORTED_FEATURES['writer.cog']).

  • [advanced] Internal overview pyramid generation (SUPPORTED_FEATURES['writer.overviews']): the overview_levels and overview_resampling knobs and the pyramid bytes themselves. Also explicit bigtiff=True; photometric= overrides; extra_tags pass-through.

  • [experimental] GPU dispatch via gpu=True; compression in {'lerc', 'jpeg2000', 'j2k', 'lz4'} behind the explicit allow_experimental_codecs=True opt-in; allow_unparseable_crs=True.

  • [internal-only] compression='jpeg' behind allow_internal_only_jpeg=True. The produced files do not round-trip through libtiff / GDAL / rasterio; the path exists for xrspatial’s own use and is not part of the externally interoperable surface.

  • Out of scope for this release (allowed to raise): rotated / sheared write support (no ModelTransformationTag emit path); silent mixed-metadata flattening.

See xrspatial.geotiff.SUPPORTED_FEATURES for the full tier map. Per-parameter tier markers below describe the tier the parameter itself carries; a parameter’s effective tier is bounded by the function-level surface above (e.g. [stable] nodata is still only stable when combined with a [stable] codec and options).

Dask-backed DataArrays on the CPU path are written in streaming mode: one tile-row at a time, without materialising the full array into RAM. The per-compute budget is sized from the source chunk geometry, so a map_overlap source (e.g. slope / aspect) chunked taller than the tile stays within streaming_buffer_bytes instead of pulling several source chunk-rows at once (#3007). COG output (cog=True) still materialises because overviews need the full array.

Dask input routed to the GPU writer (auto-detected dask+cupy, or gpu=True with any dask backing) also streams: each tile-row band is computed onto the device, compressed, and released before the next, with the per-compute budget capped by streaming_buffer_bytes (issue #3166). The bound is on device memory only: the GPU writer still assembles the compressed file in host RAM before writing it out, unlike the CPU streaming path, which writes incrementally to disk. The exception is cog=True, which materialises the full array on device because overview generation needs it, and emits a GeoTIFFFallbackWarning when it does.

Automatically dispatches to GPU compression when: - gpu=True is passed, or - The input data is CuPy-backed (auto-detected)

GPU write uses nvCOMP batch compression (deflate/ZSTD) and keeps the array on device. Falls back to CPU if nvCOMP is not available.

Parameters:
  • data (xr.DataArray or np.ndarray) – [stable] 2D raster data.

  • path (str or binary file-like) – [stable for local file paths; advanced for io.BytesIO and other in-memory file-likes] Output file path, or any object exposing a write method (e.g. io.BytesIO). When a file-like is passed, the encoded TIFF bytes are written to that object once assembly completes. cog=True and .vrt outputs require a string path.

  • crs (int, numpy.integer, str, or None) –

    [stable for int EPSG codes; advanced for WKT/PROJ strings] EPSG code (int or numpy integer scalar), WKT string, or PROJ string. If None and data is a DataArray, tries to read from attrs (‘crs’ for EPSG, ‘crs_wkt’ for WKT).

    EPSG codes are strongly preferred for interop. The WKT-only path writes ProjectedCSType / GeographicType = 32767 with the WKT stored in GTCitationGeoKey – libgeotiff and GDAL can round-trip this but many other GeoTIFF readers treat the citation as a free-form name and lose the CRS. A UserWarning is emitted when the WKT-only path is taken.

  • nodata (float, int, or None) – [stable] NoData value.

  • compression (str) –

    [stable for {'none', 'deflate', 'lzw', 'packbits', 'zstd'}; experimental for {'lerc', 'jpeg2000', 'j2k', 'lz4'} behind allow_experimental_codecs=True; internal-only for 'jpeg' behind allow_internal_only_jpeg=True] Codec name. One of 'none', 'deflate', 'lzw', 'jpeg', 'packbits', 'zstd', 'lz4', 'jpeg2000' (alias 'j2k'), or 'lerc'.

    Stable codecs (Tier 1, lossless, byte-for-byte round-trip): 'none', 'deflate', 'lzw', 'packbits', 'zstd'.

    Experimental codecs (Tier 3): 'lerc', 'jpeg2000' / 'j2k', 'lz4'. Rejected by default; pass allow_experimental_codecs=True to opt in. The opt-in emits GeoTIFFFallbackWarning once per call so the caller knows the chosen codec carries no cross-backend numerical parity claim and uneven reader support across GDAL versions. 'lerc' accepts max_z_error for lossy compression with a bounded per-pixel error.

    Internal-only codec (Tier 4): 'jpeg'. Rejected on write by default because the encoder omits the JPEGTables tag and the produced files do not round-trip through libtiff / GDAL / rasterio. Pass allow_internal_only_jpeg=True to opt in to the internal-reader-only path (see that parameter for details). allow_experimental_codecs=True does NOT cover 'jpeg': internal-only is a stricter tier than experimental, and the two flags do not collapse into one switch.

  • compression_level (int or None) –

    [stable] Compression effort level. None uses each codec’s default (6 for deflate/zstd). Valid ranges: deflate 1-9, zstd 1-22, lz4 0-16. Codecs without a level concept (lzw, packbits, jpeg) accept any value and ignore it.

    Out-of-range levels raise ValueError on every backend, including GPU dispatch. On the GPU path the nvCOMP encoder (deflate/zstd tiles) does not expose level control: an explicit level is validated but then ignored, and a UserWarning is emitted. Tiles the GPU writer compresses through the CPU codecs honor the level. Pass gpu=False if the exact level matters.

  • tiled (bool) – [stable] Use tiled layout (default True). Incompatible with cog=True because the COG specification requires a tiled internal layout; passing cog=True, tiled=False raises ValueError.

  • tile_size (int) – [stable] Tile size in pixels (default 256). Must be a positive multiple of 16 when tiled=True; this is a TIFF 6 spec requirement on TileWidth and TileLength for broad reader compatibility. Ignored when tiled=False; a warning is emitted if a non-default value is passed alongside strip mode.

  • predictor (bool or int) –

    [stable] TIFF predictor. Accepted values:

    • False, 0, or 1 -> no predictor.

    • True or 2 -> horizontal differencing (good for integer data; True and 2 are exactly equivalent).

    • 3 -> floating-point predictor (float dtypes only; typically gives better deflate/zstd ratios on float data than predictor 2).

  • cog (bool) – [stable] Write as Cloud Optimized GeoTIFF. The CPU writer emits the spec-conforming COG layout (IFD-first, tiled, internal overviews, lossless codec) covered by SUPPORTED_FEATURES['writer.cog']. Requires tiled=True (the default): the COG specification mandates a tiled internal layout, so cog=True, tiled=False raises ValueError. COG output also materialises the full array, because the overview pyramid needs random access to every pixel; the streaming_buffer_bytes kwarg is a no-op on this path. Customisation of the overview pyramid itself (overview_levels, overview_resampling) is tracked separately as advanced under SUPPORTED_FEATURES['writer.overviews'].

  • overview_levels (list[int] or None) – [advanced] Overview pyramids are an optional COG feature; the decimation factors and resampling choice affect downstream analytics in ways that are not byte-for-byte reproducible across backends. Overview decimation factors relative to full resolution. Each entry must be a power-of-two integer >= 2, and the list must be strictly increasing (e.g. [2, 4, 8] writes overviews at 1/2, 1/4 and 1/8 of the full resolution). Invalid values raise ValueError. Only used when cog=True. If None and cog=True, levels auto-generate as [2, 4, 8, ...] until the next halving would fall below tile_size (capped at 8 levels).

  • overview_resampling (str) – [advanced] Resampling method for overviews: ‘mean’ (default), ‘nearest’, ‘min’, ‘max’, ‘median’, ‘mode’, or ‘cubic’.

  • bigtiff (bool or None) – [advanced] BigTIFF uses 64-bit offsets; older readers that only speak classic TIFF cannot open the output. Force BigTIFF (64-bit offsets). None (default) auto-promotes when the estimated file size would exceed the classic-TIFF 4 GB limit. Matches the same kwarg on _write_geotiff_gpu.

  • gpu (bool or None) – [experimental] Requires cupy + numba CUDA, plus the optional nvCOMP / nvJPEG / nvJPEG2K libraries for codec-specific acceleration; backend parity with the CPU writer is tested for the Tier 1 codec set only. Force GPU compression. None (default) auto-detects CuPy data, including CuPy-backed dask arrays. Dask-backed input routed to the GPU writer streams one tile-row band at a time unless cog=True (see streaming_buffer_bytes).

  • streaming_buffer_bytes (int) – [stable] Soft cap on bytes materialised per dask compute call when streaming a dask-backed DataArray. Defaults to 256 MB. Wide rasters whose tile-row exceeds this budget are split into horizontal segments on the CPU path. On the GPU path the cap bounds the device bytes computed per tile-row band, with a floor of one full-width tile-row (issue #3166). The kwarg is a no-op for in-memory (numpy / CuPy) input and for COG output, which materialises the full array because the overview pyramid needs it; a dask-backed cog=True GPU write emits a GeoTIFFFallbackWarning when it materialises.

  • max_z_error (float) – [experimental] Per-pixel error budget for LERC compression. 0.0 (default) is lossless; larger values let the encoder approximate values within the bound, producing smaller files at the cost of accuracy bounded by abs(decoded - original) <= max_z_error. Only used when compression='lerc' (which itself requires allow_experimental_codecs=True); passing a non-zero value with any other codec raises ValueError.

  • photometric (str or int) –

    [advanced] Photometric interpretation for the TIFF Photometric tag (262).

    • 'auto' (default) – MinIsBlack (1) for any band count. ExtraSamples for every band beyond the first is tagged 0 (unspecified). Multispectral rasters (e.g. R, G, B, NIR) round-trip through this default without being silently labelled as RGB+alpha. Prior versions treated any 3+ band array as RGB and the 4th band as unassociated alpha – the behaviour change is intentional.

    • 'rgb' – RGB (Photometric=2). Three colour bands; any additional bands are tagged 0 (unspecified).

    • 'rgba' – RGB with the 4th band tagged as unassociated alpha (TIFF ExtraSamples=2). Requires at least 4 bands.

    • 'minisblack' or 'miniswhite' – grayscale; multi-band extras tagged 0. Signed-integer pixel types with 'miniswhite' are rejected with NotImplementedError – xrspatial has no semantically correct inversion for signed MinIsWhite and the silent passthrough that used to happen produced files that disagreed with the on-disk Photometric tag against every standards-compliant TIFF reader. Cast to an unsigned dtype or pass photometric='minisblack'.

    • An int – written verbatim into Photometric for advanced callers (e.g. 3 for Palette, 5 for CMYK).

    A user-supplied extra_tags entry of (TAG_PHOTOMETRIC, ...) or (TAG_EXTRA_SAMPLES, ...) overrides the writer’s chosen value; only these two tag ids are overridable so other auto-emitted tags such as ImageWidth or StripOffsets remain protected.

  • allow_experimental_codecs (bool) – [experimental] Opt in to the Tier 3 experimental codecs 'lerc', 'jpeg2000' / 'j2k', and 'lz4' (default False). Setting compression= to one of those codecs without this flag raises ValueError whose message names the flag. With the flag set, the write proceeds and a GeoTIFFFallbackWarning is emitted once per call so the caller knows the chosen codec carries no cross-backend numerical parity claim and uneven reader support across GDAL versions. Does NOT cover compression='jpeg': the internal-only JPEG path keeps its own dedicated allow_internal_only_jpeg flag because internal-only is a stricter tier than experimental. The kwarg is forwarded unchanged to _write_geotiff_gpu on the GPU dispatch path.

  • allow_internal_only_jpeg (bool) – [internal-only] Opt in to the compression='jpeg' encode path (default False). The encoder writes self-contained JFIF tiles without the TIFF JPEGTables tag (347); the file decodes through this library’s reader but not through libtiff, GDAL, or rasterio. This codec is internal-only for the release contract: it is not externally interoperable and the path exists so xrspatial can round-trip its own JPEG output. With the flag set, the write proceeds and a GeoTIFFFallbackWarning is emitted at call time. Without the flag, compression='jpeg' raises ValueError. The kwarg is forwarded unchanged to _write_geotiff_gpu on the GPU dispatch path so callers can reach the same experimental encode via to_geotiff(..., gpu=True).

  • allow_unparseable_crs (bool) – [experimental] Opt in to writing an unvalidatable CRS string into GTCitationGeoKey (default False). When False (the default), a crs= value that is neither an EPSG int nor a string that pyproj can resolve and is not structurally WKT (no PROJCS / GEOGCS / PROJCRS / GEOGCRS root) raises ValueError instead of landing verbatim in the citation field. Set to True to keep the permissive behaviour. The fail-closed default protects against files where a typo’d PROJ string or an "EPSG:4326" token on a host without pyproj produces a citation that most readers cannot interpret.

  • drop_rotation (bool, default False) – [advanced] Opt in to writing a DataArray that carries attrs['rotated_affine']. The reader sets that attr when called with allow_rotated=True on a file whose ModelTransformationTag contains rotation, shear, or z-coupling terms. The writer does not emit a ModelTransformationTag, so passing such a DataArray through to_geotiff produces an identity-affine output and loses the rotated mapping; a subsequent open_geotiff round-trip cannot recover it. Default False refuses the write with ValueError so the loss is impossible without an explicit signal. drop_rotation=True accepts the loss and lets the write proceed; consumers reading the output will see an axis-aligned, non-rotated TIFF.

  • pack (bool, default False) – [experimental] Inverse of open_geotiff(unpack=True). Re-pack a decoded float array before writing: reverse the scale / offset recorded on attrs['scale_factor'] / attrs['add_offset'], fill NaN back to the nodata sentinel, and cast to the source dtype recorded on attrs['mask_and_scale_dtype'] (contract v5; the attr keeps its historical name). The recorded dtype is the on-disk one, so an integer source gets its integer dtype back and a float32 source stays float32 rather than widening to float64 (#3080). The output stores the raw packed values and keeps the SCALE / OFFSET GDAL_METADATA, so reopening it with unpack=True unpacks to the original values instead of scaling a second time. Raises ValueError for a bare array (no attrs) or one that never went through an unpack read. When the attr is absent (arrays read before contract v5), the dtype falls back to the width of an integer-typed attrs['nodata'], or to the buffer’s own dtype when the sentinel is a float (a float sentinel is stored as a plain Python float and carries no width information). An explicit nodata= kwarg overrides the attrs sentinel as the NaN fill value, so the filled pixels always agree with the GDAL_NODATA tag the writer emits. When NaN pixels exist with no sentinel to fill them and an integer dtype must be restored, the ValueError is raised at call time for numpy-backed data; for dask-backed data the check runs per chunk inside the write’s compute (issue #3235), so the error surfaces during the write. The same timing applies to pixels whose packed value is non-finite or falls outside the packed integer dtype’s range: the cast would silently wrap them, so the write is refused with ValueError instead (issue #3260). The write itself stays atomic (temp file plus rename), so no partial output is left at the destination path.

  • color_ramp (str, bool, or None, default None) – [advanced] Write best-practice symbology sidecars so a continuous single-band raster opens in QGIS with a color ramp instead of a flat grayscale stretch. Pass a ramp name ('viridis' – the default – 'plasma', 'magma', 'inferno', 'cividis', 'greys', 'spectral', 'terrain') or True for viridis; an unknown name raises ValueError. 'greys' follows matplotlib’s light-to-dark orientation (low values render light). Two sidecars are written: a QGIS .qml style (<base>.qml) with a singlebandpseudocolor renderer, and STATISTICS_MINIMUM/MAXIMUM/MEAN/STDDEV in the PAM <file>.aux.xml. No-op for a categorical raster (one with attrs['category_names'] – those get the RAT sidecar instead), a multiband array, a file-like destination, or data with no finite values. Computing the statistics is a separate reduction pass over the data; for a dask source that means reading the graph once more (see color_ramp_range to skip it). Ignored when pack=True, whose on-disk packed values would not match a ramp built from the logical values.

  • color_ramp_range (tuple of (float, float) or None, default None) – [advanced] Explicit (min, max) for the color_ramp stretch. Skips the statistics reduction – useful for large dask graphs – so only STATISTICS_MINIMUM / STATISTICS_MAXIMUM are written (mean/stddev need the pass it avoids). Ignored when color_ramp is not set.

Returns:

The path argument (a string for filesystem paths, the file-like object for BytesIO destinations). Returning the path lines up with _build_vrt and lets callers chain a write into a read without round-tripping through a variable; existing callers that discarded the previous None return are unaffected.

Return type:

str or binary file-like

Raises:
  • ValueError – If data.attrs['transform'] is a rotated or skewed affine (b != 0 or d != 0 in rasterio Affine ordering). The on-disk GeoTIFF is axis-aligned; reproject onto an axis-aligned grid first.

  • ValueError – If data.attrs['rotated_affine'] is set and drop_rotation is False. The reader sets that attr on the allow_rotated=True path; the writer cannot emit a ModelTransformationTag so writing would silently lose the rotation. Pass drop_rotation=True to accept the loss explicitly.

  • ValueError – If data is a 3D DataArray whose dims is not (band, y, x) or (y, x, band) (accepting the band-name aliases bands / channel and spatial-name aliases lat / lon / latitude / longitude / row / col). A leading non-band dim such as time is rejected because the writer cannot infer the band axis from arbitrary names and used to silently treat the leading axis as y.

Examples

Write a DataArray to a plain GeoTIFF and read it back:

>>> from xrspatial.geotiff import open_geotiff, to_geotiff
>>> to_geotiff(data, 'elevation.tif')
>>> da = open_geotiff('elevation.tif')

Write a Cloud Optimized GeoTIFF (tiled, with internal overviews):

>>> to_geotiff(data, 'elevation_cog.tif', cog=True)

Write a VRT mosaic. A .vrt output path tiles the array and emits the index that references the tiles:

>>> to_geotiff(data, 'mosaic.vrt')