..  _user_guide.data_types:

************************
Data Type Handling
************************

This guide explains how xarray-spatial handles floating-point data types (float32 vs float64) and the implications for memory usage, precision, and performance.

Overview
========

xarray-spatial standardizes on **float32** (32-bit floating-point) for output data in most analytical functions. This design decision provides a balance between computational precision, memory efficiency, and performance that is well-suited for raster analysis tasks.

.. warning::

   All multispectral indices, terrain analysis functions, and most other
   operations in xarray-spatial output **float32** data, regardless of input
   data type.  If you pass float64 data, the output is float32.  Cast the
   result with ``.astype('float64')`` if you need higher precision.


Why Float32?
============

The decision to use float32 as the standard output type is based on several considerations:

Memory Efficiency
-----------------

Float32 uses half the memory of float64:

For large rasters or when working with multiple bands, the memory savings can be substantial.

Adequate Precision
------------------

Float32 provides approximately 7 significant decimal digits of precision, which is more than sufficient for most geospatial analysis tasks:

- **Vegetation indices** (NDVI, EVI, SAVI, etc.): Values typically range from -1.0 to +1.0
- **Terrain metrics** (slope, aspect, curvature): Values are typically expressed in degrees or simple ratios
- **Spectral indices**: Results are normalized ratios with limited dynamic range

Industry Standard
-----------------

Float32 is the de facto standard in the remote sensing and GIS industry:

- **GDAL** defaults to float32 for many raster operations
- **QGIS** raster calculators commonly output float32
- **Satellite imagery** (Landsat, Sentinel) is distributed in integer formats that easily fit within float32 precision


Input Data Type Handling
========================

xarray-spatial accepts a wide variety of input data types and automatically converts them to float32 for calculations:

.. code-block:: python

   import numpy as np
   import xarray as xr
   from xrspatial.multispectral import ndvi

   # Integer input (common for raw satellite imagery)
   nir_uint16 = xr.DataArray(np.array([[1000, 1500], [2000, 2500]], dtype=np.uint16))
   red_uint16 = xr.DataArray(np.array([[500, 700], [800, 900]], dtype=np.uint16))

   # Output will be float32
   result = ndvi(nir_agg=nir_uint16, red_agg=red_uint16)
   print(result.dtype)  # float32

   # Float64 input
   nir_float64 = xr.DataArray(np.array([[0.1, 0.15], [0.2, 0.25]], dtype=np.float64))
   red_float64 = xr.DataArray(np.array([[0.05, 0.07], [0.08, 0.09]], dtype=np.float64))

   # Output will still be float32
   result = ndvi(nir_agg=nir_float64, red_agg=red_float64)
   print(result.dtype)  # float32


Multispectral Functions
=======================

All multispectral indices convert input data to float32 before performing calculations and return float32 results


Example: NDVI with Different Input Types
----------------------------------------

.. code-block:: python

   import numpy as np
   import xarray as xr
   from xrspatial.multispectral import ndvi

   # Create sample data with different dtypes
   shape = (100, 100)

   # Simulate Landsat-style uint16 data (common for satellite imagery)
   nir_data = np.random.randint(5000, 15000, shape, dtype=np.uint16)
   red_data = np.random.randint(2000, 8000, shape, dtype=np.uint16)

   nir = xr.DataArray(nir_data, dims=['y', 'x'])
   red = xr.DataArray(red_data, dims=['y', 'x'])

   # Calculate NDVI - automatically converts to float32 internally
   vegetation_index = ndvi(nir_agg=nir, red_agg=red)

   print(f"Input dtype: {nir.dtype}")           # uint16
   print(f"Output dtype: {vegetation_index.dtype}")  # float32
   print(f"Output range: [{vegetation_index.min().values:.3f}, {vegetation_index.max().values:.3f}]")


Terrain Analysis Functions
==========================

Surface and terrain analysis functions follow the same float32 convention:

.. code-block:: python

   import numpy as np
   import xarray as xr
   from xrspatial import slope, aspect

   # Integer elevation data (e.g., from a DEM in meters)
   elevation = xr.DataArray(
       np.random.randint(0, 3000, (100, 100), dtype=np.int16),
       dims=['y', 'x']
   )

   # Both outputs will be float32
   slope_result = slope(elevation)
   aspect_result = aspect(elevation)

   print(f"Slope dtype: {slope_result.dtype}")    # float32
   print(f"Aspect dtype: {aspect_result.dtype}")  # float32


Focal Operations
================

Focal statistics and convolution operations also use float32:

.. code-block:: python

   from xrspatial.focal import mean, focal_stats
   from xrspatial.convolution import convolve_2d, circle_kernel

   # Integer input data
   data = xr.DataArray(np.random.randint(0, 255, (100, 100), dtype=np.uint8))

   # Focal operations convert to and output float32
   kernel = circle_kernel(1, 1, 3)
   smoothed = convolve_2d(data, kernel)
   print(f"Convolution output dtype: {smoothed.dtype}")  # float32


Backend Consistency
===================

xarray-spatial ensures consistent float32 output across all computational backends:

NumPy Backend
-------------

.. code-block:: python

   import numpy as np
   import xarray as xr
   from xrspatial.multispectral import ndvi

   nir = xr.DataArray(np.random.rand(100, 100).astype(np.float64))
   red = xr.DataArray(np.random.rand(100, 100).astype(np.float64))

   result = ndvi(nir, red)
   print(result.dtype)  # float32

Dask Backend
------------

.. code-block:: python

   import dask.array as da
   import xarray as xr
   from xrspatial.multispectral import ndvi

   # Dask arrays for out-of-core computation
   nir = xr.DataArray(da.random.random((1000, 1000), chunks=(250, 250)))
   red = xr.DataArray(da.random.random((1000, 1000), chunks=(250, 250)))

   result = ndvi(nir, red)
   print(result.dtype)  # float32

CuPy Backend (GPU)
------------------

.. code-block:: python

   import cupy as cp
   import xarray as xr
   from xrspatial.multispectral import ndvi

   # CuPy arrays for GPU computation
   nir = xr.DataArray(cp.random.rand(100, 100).astype(cp.float64))
   red = xr.DataArray(cp.random.rand(100, 100).astype(cp.float64))

   result = ndvi(nir, red)
   print(result.dtype)  # float32


Best Practices
==============

1. **Don't upcast unnecessarily**: If your input data is uint8 or uint16, there's no need to convert to float64 before passing to xarray-spatial functions.

2. **Trust the output type**: The float32 output is intentional and provides adequate precision for geospatial analysis.

3. **Consider memory when scaling**: When working with large rasters or many bands, the 50% memory savings of float32 vs float64 can be significant.

4. **Chain operations efficiently**: xarray-spatial functions can be chained together without precision loss, as intermediate results maintain float32 precision.

.. code-block:: python

   from xrspatial.multispectral import ndvi, savi

   # Efficient chaining - all operations use float32 internally
   ndvi_result = ndvi(nir, red)
   savi_result = savi(nir, red)

   # Combine results (still float32)
   combined = (ndvi_result + savi_result) / 2


Dataset Input Support
=====================

Most functions accept an ``xr.Dataset`` in addition to ``xr.DataArray``.
When a Dataset is passed, the operation is applied to each data variable
independently and the result is returned as a new Dataset.

Single-input functions (surface, classification, focal, proximity):

.. code-block:: python

   from xrspatial import slope

   # Apply slope to every variable in the Dataset
   slope_ds = slope(my_dataset)
   # Returns an xr.Dataset with the same variable names

Multi-input functions (multispectral indices) accept a Dataset with keyword
arguments that map band aliases to variable names:

.. code-block:: python

   from xrspatial.multispectral import ndvi

   # Map Dataset variables to band parameters
   ndvi_result = ndvi(my_dataset, nir='band_5', red='band_4')

``zonal.stats`` also accepts a Dataset for the ``values`` parameter, returning
a merged DataFrame with columns prefixed by variable name:

.. code-block:: python

   from xrspatial.zonal import stats

   df = stats(zones, my_dataset)
   # Columns: zone, elevation_mean, elevation_max, ..., temperature_mean, ...


Summary
=======

- **Input**: xarray-spatial accepts any numeric data type (int or float), as either ``xr.DataArray`` or ``xr.Dataset``
- **Processing**: All calculations are performed in float32 precision
- **Output**: Results are returned as float32 DataArrays (or a Dataset of float32 DataArrays when a Dataset is passed)
- **Consistency**: This behavior is consistent across NumPy, Dask, and CuPy backends
- **Rationale**: Float32 provides adequate precision for geospatial analysis while using half the memory of float64