Lake Mead is a water reservoir in southwestern United States and is significant for irrigation in neighboring states. The lake has experienced significant drought over the past decade and particularly between 2020 & 2023. In this notebook, we’ll find GeoTIFF data related to this lake and synthesize several raster files to produce a plot.
Outline of steps for analysis¶
- Identifying search parameters
- AOI, time-window
- Endpoint, Provider, catalog identifier (“short name”)
- Obtaining search results
- Instrospect, examine to identify features, bands of interest
- Wrap results into a DataFrame for easier exploration
- Exploring & refining search results
- Identify granules of highest value
- Filter extraneous granules with minimal contribution
- Assemble relevant filtered granules into DataFrame
- Identify kind of output to generate
- Data-wrangling to produce relevant output
- Download relevant granules into Xarray DataArray, stacked appropriately
- Do intermediate computations as necessary
- Assemble relevant data slices into visualization
Preliminary imports¶
from warnings import filterwarnings
filterwarnings('ignore')
import numpy as np, pandas as pd, xarray as xr
import rioxarray as rio
# Imports for plotting
import hvplot.pandas, hvplot.xarray
import geoviews as gv
from geoviews import opts
gv.extension('bokeh')
from bokeh.models import FixedTicker
# STAC imports to retrieve cloud data
from pystac_client import Client
from osgeo import gdal
# GDAL setup for accessing cloud data
gdal.SetConfigOption('GDAL_HTTP_COOKIEFILE','~/.cookies.txt')
gdal.SetConfigOption('GDAL_HTTP_COOKIEJAR', '~/.cookies.txt')
gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN','EMPTY_DIR')
gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF, TIFF')
Convenient utilities¶
These functions could be placed in module files for more developed research projects. For learning purposes, they are embedded within this notebook.
# simple utility to make a rectangle with given center of width dx & height dy
def make_bbox(pt,dx,dy):
'''Returns bounding-box represented as tuple (x_lo, y_lo, x_hi, y_hi)
given inputs pt=(x, y), width & height dx & dy respectively,
where x_lo = x-dx/2, x_hi=x+dx/2, y_lo = y-dy/2, y_hi = y+dy/2.
'''
return tuple(coord+sgn*delta for sgn in (-1,+1) for coord,delta in zip(pt, (dx/2,dy/2)))
# simple utility to plot an AOI or bounding-box
def plot_bbox(bbox):
'''Given bounding-box, returns GeoViews plot of Rectangle & Point at center
+ bbox: bounding-box specified as (lon_min, lat_min, lon_max, lat_max)
Assume longitude-latitude coordinates.
'''
# These plot options are fixed but can be over-ridden
point_opts = opts.Points(size=12, alpha=0.25, color='blue')
rect_opts = opts.Rectangles(line_width=0, alpha=0.1, color='red')
lon_lat = (0.5*sum(bbox[::2]), 0.5*sum(bbox[1::2]))
return (gv.Points([lon_lat]) * gv.Rectangles([bbox])).opts(point_opts, rect_opts)
# utility to extract search results into a Pandas DataFrame
def search_to_dataframe(search):
'''Constructs Pandas DataFrame from PySTAC Earthdata search results.
DataFrame columns are determined from search item properties and assets.
'asset': string identifying an Asset type associated with a granule
'href': data URL for file associated with the Asset in a given row.'''
granules = list(search.items())
assert granules, "Error: empty list of search results"
props = list({prop for g in granules for prop in g.properties.keys()})
tile_ids = map(lambda granule: granule.id.split('_')[3], granules)
rows = (([g.properties.get(k, None) for k in props] + [a, g.assets[a].href, t])
for g, t in zip(granules,tile_ids) for a in g.assets )
df = pd.concat(map(lambda x: pd.DataFrame(x, index=props+['asset','href', 'tile_id']).T, rows),
axis=0, ignore_index=True)
assert len(df), "Empty DataFrame"
return df
# utility to remap pixel values to a sequence of contiguous integers
def relabel_pixels(data, values, null_val=255, transparent_val=0, replace_null=True, start=0):
"""
This function accepts a DataArray with a finite number of categorical values as entries.
It reassigns the pixel labels to a sequence of consecutive integers starting from start.
data: Xarray DataArray with finitely many categories in its array of values.
null_val: (default 255) Pixel value used to flag missing data and/or exceptions.
transparent_val: (default 0) Pixel value that will be fully transparent when rendered.
replace_null: (default True) Maps null_value->transparent_value everywhere in data.
start: (default 0) starting range of consecutive integer values for new labels.
The values returned are:
new_data: Xarray DataArray containing pixels with new values
relabel: dictionary associating old pixel values with new pixel values
"""
new_data = data.copy(deep=True)
if values:
values = np.sort(np.array(values, dtype=np.uint8))
else:
values = np.sort(np.unique(data.values.flatten()))
if replace_null:
new_data = new_data.where(new_data!=null_val, other=transparent_val)
values = values[np.where(values!=null_val)]
n_values = len(values)
new_values = np.arange(start=start, stop=start+n_values, dtype=values.dtype)
assert transparent_val in new_values, f"{transparent_val=} not in {new_values}"
relabel = dict(zip(values, new_values))
for old, new in relabel.items():
if new==old: continue
new_data = new_data.where(new_data!=old, other=new)
return new_data, relabel
Identifying search parameters¶
We’ll identify a geographic point near the north shore of Lake Mead, make a bounding box, and choose a date range that covers Marh and part of April 2023.
lake_mead = (-114.754, 36.131)
AOI = make_bbox(lake_mead, 0.1, 0.1)
DATE_RANGE = "2023-03-01/2023-04-15"
# Optionally plot the AOI
basemap = gv.tile_sources.OSM(width=500, height=500, padding=0.1)
plot_bbox(AOI) * basemap
search_params = dict(bbox=AOI, datetime=DATE_RANGE)
print(search_params)
Obtaining search results¶
As usual, we’ll specify the search endpoint, provider, & catalog. For the DSWx family of data products these are as follows.
ENDPOINT = 'https://cmr.earthdata.nasa.gov/stac'
PROVIDER = 'POCLOUD'
COLLECTIONS = ["OPERA_L3_DSWX-HLS_V1_1.0"]
# Update the dictionary opts with list of collections to search
search_params.update(collections=COLLECTIONS)
print(search_params)
catalog = Client.open(f'{ENDPOINT}/{PROVIDER}/')
search_results = catalog.search(**search_params)
We convert the search results to a DataFrame
for easier perusal.
df = search_to_dataframe(search_results)
display(df.head())
df.info()
We’ll clean the DataFrame df
in standard ways by:
- dropping the
start_datetime
&end_datetime
columns; - renaming the
eo:cloud_cover
column; - converting the columns to suitable datatypes; and
- assigning the
datetime
column as the index.
df.datetime = pd.DatetimeIndex(df.datetime)
df = df.drop(['start_datetime', 'end_datetime'], axis=1)
df = df.rename({'eo:cloud_cover':'cloud_cover'}, axis=1)
df['cloud_cover'] = df['cloud_cover'].astype(np.float16)
for col in ['asset', 'href', 'tile_id']:
df[col] = df[col].astype(pd.StringDtype())
df = df.set_index('datetime').sort_index()
display(df.head())
df.info()
Exploring & refining search results¶
We can look at the assets
column to see which different bands are available in the results returned.
df.asset.value_counts()
The 0_B01_WTR
band is the one that we want to work with later.
We can also see how much cloud cover there is in our search results.
df.cloud_cover.agg(['min','mean','median','max'])
We can extract selected rows from the DataFrame
using boolean Series
. Specifically, we’ll select the rows that have less than 10% cloud cover and in which the asset
is the 0_B01_WTR
band.
c1 = (df.cloud_cover <= 10)
c2 = (df.asset.str.contains('B01_WTR'))
b01_wtr = df.loc[c1 & c2].drop(['asset', 'cloud_cover'], axis=1)
b01_wtr
Finally, we can see how many different MGRS tiles intersect our AOI.
b01_wtr.tile_id.value_counts()
There are four distinct geographic tiles that intersect this particular AOI.
Data-wrangling to produce relevant output¶
This time, we’ll use a technique called mosaicking to combine raster data from adjacent tiles into a single raster data set. This requires the rasterio.merge.merge
function as before. We’ll also need the function rasterio.transform.array_bounds
to get the coordinates aligned properly.
import rasterio
from rasterio.merge import merge
from rasterio.transform import array_bounds
We’ve used the function merge
before to combine distinct raster data sets associated with a single MGRS tile. This time, the raster data merged will come from adjacent MGRS tiles. In calling the merge
function in the next code cell, the column b01_wtr.href
will be treated as a list of URLs (Uniform Resource Locators). For each URL in that list, a GeoTIFF file will be downloaded and processed. The net result is a mosaicked image, i.e., a merged raster that contains a combination of all the rasters. The specifics of the merging algorithm are described in the rasterio.merge
module documentation.
%%time
mosaicked_img, mosaic_transform = merge(b01_wtr.href)
The output again consists of a NumPy array and coordinate transformation.
print(f"{type(mosaicked_img)=}\n")
print(f"{mosaicked_img.shape=}\n")
print(f"{type(mosaic_transform)=}\n")
print(f"{mosaic_transform=}\n")
The entries of mosaic_transform
describe an affine transformation from pixel coordinates to continuous UTM coordinates. In particular:
- the entry
mosaic_transform[0]
is the horizontal width of each pixel in metres; and - the entry
mosaic_transform[4]
is the vertical height of each pixel in metres.
Notice also that, in this instance, mosaic_transform[4]
is a negative value (i.e., mosaic_transform[4]==-30.0
). This tells us that the orientation of the continuous coordinate vertical axis opposes the orientation of the vertical pixel coordinate axis, i.e., the vertical continuous coordinate decreases in a downwards direction unlike the vertical pixel coordinate.
When we pass the object mosaic_transform
into the rasterio.transform.array_bounds
function, the value returned is a bounding-box, i.e., a tuple of the form (x_min, y_min, x_max, y_max)
describing the lower-left and upper-right corners of the resulting mosaicked image in continuous UTM coordinates.
bounds = array_bounds(*mosaicked_img.shape[1:], mosaic_transform)
bounds
Combining all the preceding information allows us to reconstruct the continuous UTM coordinates associated with each pixel. We’ll compute arrays for these continuous coordinates and label them longitude
and latitude
. These coordinates would more accurately be called easting
and northing
, but we’ll use the labels longitude
and latitude
respectively when we attach the coordinate arrays to an Xarray DataArray
. We choose these labels because, when the raster data is plotted with hvplot.image
, the output will use longitude-latitude coordinates.
longitude = np.arange(bounds[0], bounds[2], mosaic_transform[0])
latitude = np.arange(bounds[3], bounds[1], mosaic_transform[4])
We’ll wrap the mosaicked image and the relevant metadata into an Xarray DataArray
.
raster = xr.DataArray(
data=mosaicked_img,
dims=["band", "latitude", "longitude"],
coords=dict(
longitude=(["longitude"], longitude),
latitude=(["latitude"], latitude),
),
attrs=dict(
description="OPERA DSWx B01",
units=None,
),
)
raster
We need to attach a CRS object to the raster
object. To do so, we’ll use rasterio.open
to load the relevant metadata from one of the granules associated with b01_wtr
(these should be the same for all these particular files).
with rasterio.open(b01_wtr.href[0]) as ds:
crs = ds.crs
raster.rio.write_crs(crs, inplace=True)
print(raster.rio.crs)
In research code, we could bundle the preceding commands within a function and save that to a module. We won’t do that here because, for the purposes of this tutorial, it’s preferable to make sure that we can examine the output of various lines of code interactively.
With all the preceding steps completed, we’re ready to produce a plot of the mosaicked raster. We’ll relabel the pixel values so that the colorboar in the final result will be tidier.
raster, relabel = relabel_pixels(raster, values=[0,1,2,253,254,255])
We’ll define image options, layout options, & a colormap in dictionaries as we’ve done previously to produce a single plot.
image_opts = dict(
x='longitude',
y='latitude',
rasterize=True,
dynamic=True,
crs=raster.rio.crs
)
layout_opts = dict(
xlabel='Longitude',
ylabel='Latitude',
)
# Define a colormap using RGBA values; these need to be written manually here...
COLORS = {
0: (255, 255, 255, 0.0), # No Water
1: (0, 0, 255, 1.0), # Open Water
2: (180, 213, 244, 1.0), # Partial Surface Water
3: ( 0, 255, 255, 1.0), # Snow/Ice
4: (175, 175, 175, 1.0), # Cloud/Cloud Shadow
5: ( 0, 0, 127, 0.5), # Ocean Masked
}
c_labels = ["Not water", "Open water", "Partial Surface Water", "Snow/Ice",
"Cloud/Cloud Shadow", "Ocean Masked"]
c_ticks = list(COLORS.keys())
limits = (c_ticks[0]-0.5, c_ticks[-1]+0.5)
c_bar_opts = dict( ticker=FixedTicker(ticks=c_ticks),
major_label_overrides=dict(zip(c_ticks, c_labels)),
major_tick_line_width=0, )
image_opts.update({ 'cmap': list(COLORS.values()),
'clim': limits,
})
layout_opts.update(dict(colorbar_opts=c_bar_opts))
We’ll define the basemap as a separate object to overlay using the *
operator.
basemap = gv.tile_sources.ESRI(frame_width=500, frame_height=500, padding=0.05, alpha=0.25)
Finally, we can use the builtin Python slice
function to extract downsampled images quickly before trying to view the entire image. Remember, reducing the value steps
to 1
(or None
) plots the raster at full resolution.
%%time
steps = 1
view = raster.isel(longitude=slice(0,None,steps), latitude=slice(0,None,steps)).squeeze()
view.hvplot.image(**image_opts).opts(**layout_opts) * basemap
This raster is much larger than ones we’ve previously examined (requiring roughly 4 times the storage). This process could be iterated to make a slider showing the merged results from neighboring tiles at different times. This, of course, requires that there is enough memory available.