Given that most of the geospatial data we’ll work with in this tutorial is stored in GeoTIFF files, we need to know how to work with those files. The easiest solution is to use rioxarray; this solution takes care of a lot of tricky details transparently. We can also use Rasterio as a tool for reading data or metadata from GeoTIFF files; judicious use of Rasterio can make a big difference when working with remote files in the cloud.
import numpy as np
import rasterio
import rioxarray as rio
from pathlib import Path
rioxarray¶
rioxarray
is a package that extends the Xarray package (more on that later). The primary rioxarray
features we’ll use within this tutorial are:
rioxarray.open_rasterio
to load GeoTIFF files directly into XarrayDataArray
structures; andxarray.DataArray.rio
to provides useful utilities (e.g., for specifying CRS information).
To get used to working with GeoTIFF files, we’ll use a few specific examples in this & later notebooks. We’ll explain more about what kind of data is contained within the file later; for now, we just want to get used to loading data.
Loading files into a DataArray¶
Observe first that open_rasterio
works on local file paths and remote URLs.
- Predictably, local access is faster than remote access.
%%time
LOCAL_PATH = Path.cwd().parent / 'assets' / 'OPERA_L3_DIST-ALERT-HLS_T10TEM_20220815T185931Z_20220817T153514Z_S2A_30_v0.1_VEG-ANOM-MAX.tif'
data = rio.open_rasterio(LOCAL_PATH)
%%time
REMOTE_URL ='https://opera-provisional-products.s3.us-west-2.amazonaws.com/DIST/DIST_HLS/WG/DIST-ALERT/McKinney_Wildfire/OPERA_L3_DIST-ALERT-HLS_T10TEM_20220815T185931Z_20220817T153514Z_S2A_30_v0.1/OPERA_L3_DIST-ALERT-HLS_T10TEM_20220815T185931Z_20220817T153514Z_S2A_30_v0.1_VEG-ANOM-MAX.tif'
data_remote = rio.open_rasterio(REMOTE_URL)
This next operation compares elements of an Xarray DataArray
elementwise (the use of the .all
method is similar to what we would do to compare NumPy arrays). This is generally not an advisable way to compare arrays, series, dataframes, or other large data structures that contain floating-point data. However, in this particular instance, as the two data structures have been read from the same file stored in two different locations, element-by-element comparison makes sense. It confirms that the data loaded into memory from two different sources is identical in every bit.
(data_remote == data).all() # Verify that the data is identical from both sources
rasterio¶
This section can be safely skipped if rioxarray
works adequately for our analyses, i.e., if loading data into memory is not prohibitive; when that is not the case, rasterio
provides alternative strategies for exploring GeoTIFF files. That is, the rasterio
package offers lower-level approaches to loading data than rioxarray
does when needed.
From the Rasterio documentation:
Before Rasterio there was one Python option for accessing the many different kind of raster data files used in the GIS field: the Python bindings distributed with the Geospatial Data Abstraction Library, GDAL. These bindings extend Python, but provide little abstraction for GDAL’s C API. This means that Python programs using them tend to read and run like C programs. For example, GDAL’s Python bindings require users to watch out for dangling C pointers, potential crashers of programs. This is bad: among other considerations we’ve chosen Python instead of C to avoid problems with pointers.
What would it be like to have a geospatial data abstraction in the Python standard library? One that used modern Python language features and idioms? One that freed users from concern about dangling pointers and other C programming pitfalls? Rasterio’s goal is to be this kind of raster data library – expressing GDAL’s data model using fewer non-idiomatic extension classes and more idiomatic Python types and protocols, while performing as fast as GDAL’s Python bindings.
High performance, lower cognitive load, cleaner and more transparent code. This is what Rasterio is about.
Opening files with rasterio.open¶
# Show rasterio.open works using context manager
LOCAL_PATH = Path.cwd().parent / 'assets' / 'OPERA_L3_DIST-ALERT-HLS_T10TEM_20220815T185931Z_20220817T153514Z_S2A_30_v0.1_VEG-ANOM-MAX.tif'
print(LOCAL_PATH)
Given a data source (e.g., a GeoTIFF file in the current context), we can open a DatasetReader
object associated with using rasterio.open
. Technically, we have to remember to close the object afterward. That is, our code would look like this:
ds = rasterio.open(LOCAL_PATH)
# ..
# do some computation
# ...
ds.close()
As with file-handling in Python, we can use a context manager (i.e., a with
clause) instead.
with rasterio.open(LOCAL_PATH) as ds:
# ...
# do some computation
# ...
# more code outside the scope of the with block.
The dataset will be closed automatically outside the with
block.
with rasterio.open(LOCAL_PATH) as ds:
print(f'{type(ds)=}')
assert not ds.closed
# outside the scope of the with block
assert ds.closed
The principal advantage of using rasterio.open
rather than rioxarray.open_rasterio
to open a file is that the latter approach opens the file and immediately loads its contents into a DataDarray
in memory.
By contrast, using rasterio.open
opens the file in place and its contents are not immediately loaded into memory. The file’s data can be read, but this must be done explicitly. This makes a lot of difference when working with remote data; transferring the entire contents across a network involves certain costs. For example, if we examine the metadata—which is typically much smaller and can be transferred quickly—we may find, e.g., that moving an entire array of data across the network is not warranted.
Examining DatasetReader attributes¶
When a file is opened using rasterio.open
, the object instantiated is from the DatasetReader
class. This class has a number of attributes and methods of interest to us:
profile | height | width |
shape | count | nodata |
crs | transform | bounds |
xy | index | read |
First, given a DatasetReader
ds
associated with a data source, examining ds.profile
returns some diagnostic information.
with rasterio.open(LOCAL_PATH) as ds:
print(f'{ds.profile=}')
The attributes ds.height
, ds.width
, ds.shape
, ds.count
, ds.nodata
, and ds.transform
are all included in the output from ds.profile
but are also accessible individually.
with rasterio.open(LOCAL_PATH) as ds:
print(f'{ds.height=}')
print(f'{ds.width=}')
print(f'{ds.shape=}')
print(f'{ds.count=}')
print(f'{ds.nodata=}')
print(f'{ds.crs=}')
print(f'{ds.transform=}')
Reading data into memory¶
The method ds.read
loads an array from the data file into memory. Notice this can be done on local or remote files.
%%time
with rasterio.open(LOCAL_PATH) as ds:
array = ds.read()
print(f'{array.shape=}')
%%time
with rasterio.open(REMOTE_URL) as ds:
array = ds.read()
print(f'{array.shape=}')
print(f'{type(array)=}')
The array loaded into memory with ds.read
is a NumPy array. This can be wrapped by an Xarray DataArray
if we provide additional code to specify the coordinate labels and so on.
Mapping coordinates¶
Earlier, we loaded data from a local file into a DataArray
called da
using rioxarray.open_rasterio
.
da = rio.open_rasterio(LOCAL_PATH)
da
Doing so simplified the loading raster data from a GeoTIFF file into an Xarray DataArray
while filling in the metadata for us. In particular, the coordinates associated with the pixels were stored into da.coords
(the default coordinate axes are band
, x
, and y
for this three-dimensional array).
If we ignore the extra band
dimension, the pixels of the raster data are associated with pixel coordinates (integers) and spatial coordinates (real values, typically a point at the centre of each pixel).
(from http://ioam.github.io/topographica
)
The accessors da.isel
and da.sel
allow us to extract slices from the data array using pixel coordinates or spatial coordinates respectively.
If we use rasterio.open
to open a file, the DatasetReader
attribute transform
provides the details of how to convert between pixel and spatial coordinates. We will use this capability in some of the case studies later.
with rasterio.open(LOCAL_PATH) as ds:
print(f'{ds.transform=}')
print(f'{np.abs(ds.transform[0])=}')
print(f'{np.abs(ds.transform[4])=}')
The attribute ds.transform
is an object describing an affine transformation (represented above as a matrix). Notice that the absolute values of the diagonal entries of the matrix ds.transform
give the spatial dimensions of pixels ( in this case).
We can also use this object to convert pixel coordinates to the corresponding spatial coordinates.
with rasterio.open(LOCAL_PATH) as ds:
print(f'{ds.transform * (0,0)=}') # top-left pixel
print(f'{ds.transform * (0,3660)=}') # bottom-left pixel
print(f'{ds.transform * (3660,0)=}') # top-right pixel
print(f'{ds.transform * (3660,3660)=}') # bottom-right pixel
The attribute ds.bounds
displays the bounds of the spatial region (left, bottom, right, top).
with rasterio.open(LOCAL_PATH) as ds:
print(f'coordinate bounds: {ds.bounds=}')
The method ds.xy
also converts integer index coordinates to continuous coordinates. Notice that ds.xy
maps integers to the centre of pixels. The loops below print out the first top left corner of the coordinates in pixel coordinates and then the cooresponding spatial coordinates.
with rasterio.open(LOCAL_PATH) as ds:
for k in range(3):
for l in range(4):
print(f'({k:2d},{l:2d})','\t', end='')
print()
print()
for k in range(3):
for l in range(4):
e,n = ds.xy(k,l)
print(f'({e},{n})','\t', end='')
print()
print()
ds.index
does the reverse: given spatial coordinates (x,y)
, it returns the integer indices of the pixel that contains that point.
with rasterio.open(LOCAL_PATH) as ds:
print(ds.index(500000, 4700015))
These conversions can be tricky, particularly because pixel coordinates map to the centres of the pixels and also because the second y
spatial coordinate decreases as the second pixel coordinate increases. Keeping track of tedious details like this is partly why loading from rioxarray
is useful, i.e., this is done for us. But it is worth knowing that we can reconstruct this mapping if needed from metadata in the GeoTIFF file (we use this fact later).
with rasterio.open(LOCAL_PATH) as ds:
print(ds.bounds)
print(ds.transform * (0.5,0.5)) # Maps to centre of top left pixel
print(ds.xy(0,0)) # Same as above
print(ds.transform * (0,0)) # Maps to top left corner of top left pixel
print(ds.xy(-0.5,-0.5)) # Same as above
print(ds.transform[0], ds.transform[4])