Constructing Advanced Visualizations

Let’s apply some of the tools we’ve seen so far to make some more sophisticated visualizations. These will include using vector data from a GeoPandas GeoDataFrame, constructing both static plots and dynamic views from a 3D array, and combining vector data and raster data.

As context, the files we’ll examine are based on the 2022 McKinney widfire in Klamath National Forest (western Siskiyou County, California). The vector data is a snapshot of the boundary of a wildfire; the raster data corresponds to the observed land surface disturbance of vegetation (this will be explained in greater detail later).

Preliminary imports and file paths¶

To begin, some typical package imports are needed. We’ll also define some paths to local files containing relevant geospatial data.

from warnings import filterwarnings
filterwarnings('ignore')
from pathlib import Path
import numpy as np, pandas as pd, xarray as xr
import geopandas as gpd
import rioxarray as rio

# Imports for plotting
import hvplot.pandas, hvplot.xarray
import geoviews as gv
from geoviews import opts
gv.extension('bokeh')

ASSET_PATH = Path(FILE_STEM, 'assets')
SHAPE_FILE = ASSET_PATH / 'shapefiles' / 'mckinney' / 'McKinney_NIFC.shp'
RASTER_FILES = list(ASSET_PATH.glob('OPERA*VEG*.tif'))
RASTER_FILE = RASTER_FILES[0]

Plotting vector data from a `GeoDataFrame`¶

The GeoPandas package contains many useful tools for working with vector geospatial data. In particular, the GeoPandas GeoDataFrame is a subclass of the Pandas DataFrame that is specifically tailored for vector data stored in, e.g., shapefiles.

As an example, let’s load some vector data from the local path SHAPEFILE (as defined above).

shape_df = gpd.read_file(SHAPE_FILE)
shape_df

The object shape_df is a GeoDataFrame that has over two dozen columns of metadata in a single row. The main column that concerns us is the geometry column; this column stores the coordinates of the vertices of a MULTIPOLYGON, i.e., a set of polygons.

shape_df.geometry

The vertices of the polygons seem to be expressed as (longitude, latitude) pairs. We can verify what specific coordinate system is used by examining the GeoDataFrame’s crs attribute.

print(shape_df.crs)

Let’s use hvplot to generate a plot of this vector dataset.

shape_df.hvplot()

The projection in this plot is a bit strange. The HoloViz documentation includes a discussion of considerations when plotting geographic data; the salient point to remember in this immediate context is that the option geo=True is useful.

Let’s create two Python dictionaries—shapeplot_opts & layout_opts—and build up a visualization.

shapeplot_opts= dict(geo=True)
layout_opts = dict(xlabel='Longitude', ylabel="Latitude")
print(f"{shapeplot_opts=}")
print(f"{layout_opts=}")

shape_df.hvplot(**shapeplot_opts).opts(**layout_opts)

We can augment the plot by updating shapeplot_opts.

Specifying color=None means that the polygons will not be filled.
Specifying line_width and line_color modifies the appearance of the boundaries of the polygons.

shapeplot_opts.update( color=None ,
                       line_width=2,
                       line_color='red')
print(shapeplot_opts)

shape_df.hvplot(**shapeplot_opts).opts(**layout_opts)

Let’s fill the polygons with, say, color=orange and make the filled area partially transparent by specifying an alpha value between zero and one.

shapeplot_opts.update(color='orange', alpha=0.25)
print(shapeplot_opts)

shape_df.hvplot(**shapeplot_opts).opts(**layout_opts)

Adding a basemap¶

Next, let’s provide a basemap and overlay the plotted polygons using the * operator. Notice the use of parentheses to apply the opts method to the output of the * operator.

basemap = gv.tile_sources.ESRI(height=500, width=500, padding=0.1)

(shape_df.hvplot(**shapeplot_opts) * basemap).opts(**layout_opts)

The basemap does not need to be overlayed as as separate object; it can be specified using the tiles keyword. For instance, setting tiles='OSM' uses OpenStreetMap tiles instead. Notice the dimensions of the image rendered are likely not the same as the previous image with the ESRI tiles because we explicitly specified height=500 & width=500 in defining basemap earlier.

shapeplot_opts.update(tiles='OSM')
shape_df.hvplot(**shapeplot_opts).opts(**layout_opts)

Let’s remove the tiles option from shapeplot_opts and bind the resulting plot object to the identifier shapeplot.

del shapeplot_opts['tiles']
print(shapeplot_opts)

shapeplot = shape_df.hvplot(**shapeplot_opts)
shapeplot

Combining vector data with raster data in a static plot¶

Let’s combine this vector data with some raster data. We’ll load raster data from a local file using the function rioxarray.open_rasterio. For convenience, we’ll use method-chaining to relabel the coordinates of the DataArray loaded and use the squeeze method to convert the three-dimensional array loaded into a two-dimensional array.

raster = (
          rio.open_rasterio(RASTER_FILE)
            .rename(dict(x='longitude', y='latitude'))
            .squeeze()
         )
raster

We’ll use a Python dictionary image_opts to store useful keyord arguments to pass to hvplot.image.

image_opts = dict(
                    x='longitude',
                    y='latitude',                   
                    rasterize=True, 
                    dynamic=True,
                    cmap='hot_r', 
                    clim=(0, 100), 
                    alpha=0.8,
                    project=True,
                 )

raster.hvplot.image(**image_opts)

We can overlay shapeplot with the plotted raster data using the * operator. We can use the Pan, Wheel Zoom, and Box Zoom tools in the Bokeh toolbar to the right of the plot to zoom in and verify that the vector data has in fact been plotted on top of the raster data.

raster.hvplot.image(**image_opts) * shapeplot

We can additionally overlay the vector & raster data onto ESRI tiles using basemap.

raster.hvplot.image(**image_opts) * shapeplot * basemap

Notice many of the white pixels are obscuring the plot. It turns out that these pixels correspond to zeros in the raster data that can safely be ignored. We can filter those out using the where method.

raster = raster.where((raster!=0))
layout_opts.update(title="McKinney 2022 Wildfires")

(raster.hvplot.image(**image_opts) * shapeplot * basemap).opts(**layout_opts)

Constructing static plots from a 3D array¶

Let’s load a sequence of raster files into a three-dimensional array and produce some plots. To start, we’ll construct a loop to read DataArrays from the files RASTER_FILES and we’ll use xarray.concat to produce a single three-dimensional array of rasters (i.e., three $3,600\times3,600$ rasters stacked vertically). We’ll learn the specific interpretations associated with the raster dataset in a later notebook; for now, let’s just treat them as raw data to experiment with.

stack = []
for path in RASTER_FILES:
    data = rio.open_rasterio(path).rename(dict(x='longitude', y='latitude'))
    band_name = path.stem.split('_')[-1]
    data.coords.update({'band': [band_name]})
    data.attrs = dict(description=f"OPERA DIST product", units=None)
    stack.append(data)

stack = xr.concat(stack, dim='band')
stack = stack.where(stack!=0)

We relabel the axes longitude & latitude and we filter out pixels with value zero to make plotting simpler later.

stack

Having built the DataArray stack, we can focus on visualization.

If we want to generate a static plot with several images laid out, we can use hvplot.image together with the .layout method. To see how this works, let’s start by redefining dictionaries image_opts and layout_opts.

image_opts = dict(  x='longitude', 
                    y='latitude',
                    rasterize=True, 
                    dynamic=True,
                    cmap='hot_r',
                    crs=stack.rio.crs,
                    alpha=0.8,
                    project=True, 
                    aspect='equal',
                    shared_axes=False,
                    colorbar=True,
                    tiles='ESRI',
                    padding=0.1)
layout_opts = dict(xlabel='Longitude', ylabel="Latitude")

To speed up rendering, we’ll initially construct an object view that selects subsets of pixels; we initially define the parameter steps=200 to restrict the view to every 200th pixel in either direction. If we reduce steps, it takes longer to render. Setting steps=1 or steps=None is equivalent to selecting all pixels.

steps = 200
subset = slice(0, None, steps)
layout_opts.update(frame_width=250, frame_height=250)


view = stack.isel(latitude=subset, longitude=subset)
view.hvplot.image(**image_opts).opts(**layout_opts).layout()

The layout method by default plotted each of the three rasters selected along the band axis horizontally.

Constructing a dynamic view from a 3D array¶

Another way to visualize a three-dimensional array is to associate a selection widget with one of the axes. If we call hvplot.image without appending the .layout method, the result is a dynamic map. In this instance, the selection widget allows us to choose slices from along the band axis.

Again, increasing the parameter steps reduces the rendering time. Decreasing it to 1 or None renders at full resolution.

steps = 200
subset = slice(0, None, steps)
layout_opts.update(frame_height=400, frame_width=400)

view = stack.isel(latitude=subset, longitude=subset)
view.hvplot.image(**image_opts).opts(**layout_opts)

Later, we’ll stack many rasters with distinct timestamps along a time axis; when there are many slices, the selection widget will be rendered as a slider rather than as a drop-down menu. We can control the location of the widget using a keyword option widget_location.

view.hvplot.image(widget_location='bottom_left', **image_opts, **layout_opts)

Notice adding the widget_location option does slightly modify the sequence in which options are specified. That is, if we invoke something like

view.hvplot.image(widget_location='top_left', **image_opts).opts(**layout_opts)

an exception is raised (hence the invocation in the code cell above). There are some subtle difficulties in working out the correct sequence to apply options when customizing HoloViz/Hvplot objects (mainly due to the ways in which namespaces overlap with Bokeh or other rendering back-ends). The best strategy is to start with as few options as possible and to experiment by incrementally adding options until we get the final visualization we want.

Combining vector data with raster data in a dynamic view¶

Finally, let’s overlay our vector data from before—the boundary of the wildfire—with the dynamic view of this three-dimensional array of rasters. We can use the * operator to combine the output of hvplot.image with shapeplot, the rendered view of the vector data.

steps = 200
subset = slice(0, None, steps)
view = stack.isel(latitude=subset, longitude=subset)

image_opts.update(colorbar=False)
layout_opts.update(frame_height=500, frame_width=500)
(view.hvplot.image(**image_opts) * shapeplot).opts(**layout_opts)

Again, getting the options specified correctly can take a bit of experimentation. The important ideas to take away here are:

how to load relevant datasets with geopandas and rioxarray; and
how to use hvplot interactively & incrementally to produce compelling visualizations.

2. Geospatial fundamentals

Data Visualization with GeoViews & HvPlot

3. Using NASA EarthData

Using OPERA DIST Products