This notebook (download) illustrates the use of icepyx for managing lists of available and wanted ICESat-2 data variables. The two use cases for variable management within your workflow are:
- During the data access process, whether that’s via order and download (e.g. via NSIDC DAAC) or remote (e.g. via the cloud).
- When reading in data to a Python object (whether from local files or the cloud).
A given ICESat-2 product may have over 200 variable + path combinations.
icepyx includes a custom Variables
module that is “aware” of the ATLAS sensor and how the ICESat-2 data products are stored.
The module can be accessed independently and can also be accessed as a component of a Query
object or Read
object.
This notebook illustrates in detail how the Variables
module behaves. We use the module independently and also show how powerful it is directly in the icepyx workflow using a Query
data access example.
Module usage using Query
is analogous through an icepyx ICESat-2 Read
object.
More detailed example workflows specifically for the query and read tools within icepyx are available as separate Jupyter Notebooks.
Questions? Be sure to check out the FAQs throughout this notebook, indicated as italic headings.
Why do ICESat-2 products need a custom variable manager?¶
It can be confusing and cumbersome to comb through the 200+ variable and path combinations contained in ICESat-2 data products. An hdf5 file is built like a folder with files in it. Opening an ICESat-2 file can be like opening a new folder with over 200 files in it and manually searching for only ones you want!
The icepyx Variables
module makes it easier for users to quickly find and extract the specific variables they would like to work with across multiple beams, keywords, and variables and provides reader-friendly formatting to browse variables.
A future development goal for icepyx
includes developing an interactive widget to further improve the user experience.
For data read-in, additional tools are available to target specific beam characteristics (e.g. strong versus weak beams).
Import packages, including icepyx
import icepyx as ipx
from pprint import pprint
Creating or Accessing ICESat-2 Variables¶
There are three ways to create or access an ICESat-2 Variables object in icepyx:
- Access via the
.order_vars
property of a Query object - Access via the
.vars
property of a Read object - Create a stand-alone ICESat-2 Variables object using a local file, cloud file, or a product name
An example of each of these is shown below.
1. Access Variables
via the .order_vars
property of a Query object¶
region_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-22','2019-02-28'], \
start_time='00:00:00', end_time='23:59:59')
# Accessing Variables
region_a.order_vars
# Showing the variable paths
region_a.order_vars.avail()
2. Access via the .vars
property of a Read object¶
path_root = '/full/path/to/your/data/'
reader = ipx.Read(path_root)
# Accessing Variables
reader.vars
# Showing the variable paths
# reader.vars.avail()
3. Create a stand-alone Variables object¶
You can also generate an independent Variables object. This can be done using either:
- The filepath to a local or cloud file you’d like a variables list for
- The product name (and optionally version) of a an ICESat-2 product
Note: Cloud data access requires a valid Earthdata login; you will be prompted to log in if you are not already authenticated.
Create a variables object from a filepath:
filepath = '/full/path/to/your/data.h5'
v = ipx.Variables(path=filepath)
# v.avail()
Create a variables object from a product. The version argument is optional.
v = ipx.Variables(product='ATL03')
# v.avail()
v = ipx.Variables(product='ATL03', version='006')
# v.avail()
Now that you know how to create or access Variables the remainder of this notebook showcases the functions available for building and modifying variables lists. Remember, the example shown below uses a Query object, but the same methods are available if you are using a Read object or a Variables object.
Interacting with ICESat-2 Data Variables¶
Each variables instance (which is actually an associated Variables class object) contains two variable list attributes.
One is the list of possible or available variables (avail
attribute) and is unmutable, or unchangeable, as it is based on the input product specifications or files.
The other is the list of variables you’d like to actually have (in your downloaded file or data object) from all the potential options (wanted
attribute) and is updateable.
Thus, your avail
list depends on your data source and whether you are accessing or reading data, while your wanted
list may change for each analysis you are working on or depending on what variables you want to see.
The variables parameter has methods to:
- get a list of all available variables, either available from the NSIDC or the file (
avail()
method). - append new variables to the wanted list (
append()
method). - remove variables from the wanted list (
remove()
method).
We’ll showcase the use of all of these methods and attributes below using an icepyx.Query
object.
Usage is identical in the case of an icepyx.Read
object.
More detailed example workflows specifically for the query and read tools within icepyx are available as separate Jupyter Notebooks.
Create a query object and log in to Earthdata
For this example, we’ll be working with a land ice product (ATL06) for an area along West Greenland (Disko Bay). A second option for an atmospheric product (ATL09) that uses profiles instead of the ground track (gt) categorization is also provided.
region_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-22','2019-02-28'], \
start_time='00:00:00', end_time='23:59:59')
# Uncomment and run the code in this cell to use the second variable subsetting suite of examples,
# with the beam specifier containing "profile" instead of "gt#l"
# region_a = ipx.Query('ATL09',[-55, 68, -48, 71],['2019-02-22','2019-02-28'], \
# start_time='00:00:00', end_time='23:59:59')
ICESat-2 data variables¶
ICESat-2 data is natively stored in a nested file format called hdf5.
Much like a directory-file system on a computer, each variable (file) has a unique path through the hierarchy (directories) within the file.
Thus, some variables (e.g. 'latitude'
, 'longitude'
) have multiple paths (one for each of the six beams in most products).
Determine what variables are available¶
region_a.order_vars.avail
will return a list of all valid path+variable strings.
region_a.order_vars.avail()
To increase readability, you can use built in functions to show the 200+ variable + path combinations as a dictionary where the keys are variable names and the values are the paths to that variable.
region_a.order_vars.parse_var_list(region_a.order_vars.avail())
will return a dictionary of variable:paths key:value pairs.
region_a.order_vars.parse_var_list(region_a.order_vars.avail())
By passing the boolean options=True
to the avail
method, you can obtain lists of unique possible variable inputs (var_list inputs) and path subdirectory inputs (keyword_list and beam_list inputs) for your data product. These can be helpful for building your wanted variable list.
region_a.order_vars.avail(options=True)
# Using a Read object
reader.vars.avail()
reader.vars.parse_var_list(reader.vars.avail())
reader.vars.avail(options=True)
# Using a file on your computer
v = Variables(path='/my/file.h5')
v.avail()
v.parse_var_list(v.avail())
v.avail(options=True)
Building your wanted variable list¶
Now that you know which variables and path components are available, you need to build a list of the ones you’d like included. There are several options for generating your initial list as well as modifying it, giving the user complete control.
The options for building your initial list are:
- Use a default list for the product (not yet fully implemented across all products. Have a default variable list for your field/product? Submit a pull request or post it as an issue on GitHub!)
- Provide a list of variable names
- Provide a list of profiles/beams or other path keywords, where “keywords” are simply the unique subdirectory names contained in the full variable paths of the product. A full list of available keywords for the product is displayed in the error message upon entering
keyword_list=['']
into theappend
function (see below for an example) or by runningregion_a.order_vars.avail(options=True)
, as above.
Note: all products have a short list of “mandatory” variables/paths (containing spacecraft orientation and time information needed to convert the data’s delta_time
to a readable datetime) that are automatically added to any built list. If you have any recommendations for other variables that should always be included (e.g. uncertainty information), please let us know!
Examples of using each method to build and modify your wanted variable list are below.
region_a.order_vars.wanted
region_a.order_vars.append(defaults=True)
pprint(region_a.order_vars.wanted)
The keywords available for this product are shown in the error message upon entering a blank keyword_list, as seen in the next cell.
region_a.order_vars.append(keyword_list=[''])
Modifying your wanted variable list¶
Generating and modifying your variable request list, which is stored in region_a.order_vars.wanted
, is controlled by the append
and remove
functions that operate on region_a.order_vars.wanted
. The input options to append
are as follows (the full documentation for this function can be found by executing help(region_a.order_vars.append)
).
defaults
(default False) - include the default variable list for your product (not yet fully implemented for all products; please submit your default variable list for inclusion!)var_list
(default None) - list of variables (entered as strings)beam_list
(default None) - list of beams/profiles (entered as strings)keyword_list
(default None) - list of keywords (entered as strings); usekeyword_list=['']
to obtain a list of available keywords
Similarly, the options for remove
are:
all
(default False) - resetregion_a.order_vars.wanted
to Nonevar_list
(as above)beam_list
(as above)keyword_list
(as above)
region_a.order_vars.remove(all=True)
pprint(region_a.order_vars.wanted)
Examples (Overview)¶
Below are a series of examples to show how you can use append
and remove
to modify your wanted variable list.
For clarity, region_a.order_vars.wanted
is cleared at the start of many examples.
However, multiple append
and remove
commands can be called in succession to build your wanted variable list (see Examples 3+).
There are two example tracks. The first is for land ice (ATL06) data that is separated into beams. The second is for atmospheric data (ATL09) that is separated into profiles. Both example tracks showcase the same functionality and are provided for users of both data types.
Example Track 1 (Land Ice - run with ATL06 dataset)¶
Example 1.1: choose variables¶
Add all latitude
and longitude
variables across all six beam groups. Note that the additional required variables for time and spacecraft orientation are included by default.
region_a.order_vars.append(var_list=['latitude','longitude'])
pprint(region_a.order_vars.wanted)
Example 1.2: specify beams and variable¶
Add latitude
for only gt1l
and gt2l
region_a.order_vars.remove(all=True)
pprint(region_a.order_vars.wanted)
var_dict = region_a.order_vars.append(beam_list=['gt1l', 'gt2l'], var_list=['latitude'])
pprint(region_a.order_vars.wanted)
Example 1.3: add/remove selected beams+variables¶
Add latitude
for gt3l
and remove it for gt2l
region_a.order_vars.append(beam_list=['gt3l'],var_list=['latitude'])
region_a.order_vars.remove(beam_list=['gt2l'], var_list=['latitude'])
pprint(region_a.order_vars.wanted)
Example 1.4: keyword_list
¶
Add latitude
and longitude
for all beams and with keyword land_ice_segments
region_a.order_vars.append(var_list=['latitude', 'longitude'],keyword_list=['land_ice_segments'])
pprint(region_a.order_vars.wanted)
Example 1.5: target a specific variable + path¶
Remove gt1r/land_ice_segments/longitude
(but keep gt1r/land_ice_segments/latitude
)
region_a.order_vars.remove(beam_list=['gt1r'], var_list=['longitude'], keyword_list=['land_ice_segments'])
pprint(region_a.order_vars.wanted)
Example 1.6: add variables not specific to beams/profiles¶
Add rgt
under orbit_info
.
region_a.order_vars.append(keyword_list=['orbit_info'],var_list=['rgt'])
pprint(region_a.order_vars.wanted)
Example 1.7: add all variables+paths of a group¶
In addition to adding specific variables and paths, we can filter all variables with a specific keyword as well. Here, we add all variables under orbit_info
. Note that paths already in region_a.order_vars.wanted
, such as 'orbit_info/rgt'
, are not duplicated.
region_a.order_vars.append(keyword_list=['orbit_info'])
pprint(region_a.order_vars.wanted)
Example 1.8: add all possible values for variables+paths¶
Append all longitude
paths and all variables/paths with keyword land_ice_segments
.
Similarly to what is shown in Example 4, if you submit only one append
call as region_a.order_vars.append(var_list=['longitude'], keyword_list=['land_ice_segments'])
rather than the two append
calls shown below, you will only add the variable longitude
and only paths containing land_ice_segments
, not ALL paths for longitude
and ANY variables with land_ice_segments
in their path.
region_a.order_vars.append(var_list=['longitude'])
region_a.order_vars.append(keyword_list=['land_ice_segments'])
pprint(region_a.order_vars.wanted)
Example 1.9: remove all variables+paths associated with a beam¶
Remove all paths for gt1l
and gt3r
region_a.order_vars.remove(beam_list=['gt1l','gt3r'])
pprint(region_a.order_vars.wanted)
Example 1.10: generate a default list for the rest of the tutorial¶
Generate a reasonable variable list prior to download
region_a.order_vars.remove(all=True)
region_a.order_vars.append(defaults=True)
pprint(region_a.order_vars.wanted)
Example Track 2 (Atmosphere - run with ATL09 dataset commented out at the start of the notebook)¶
Example 2.1: choose variables¶
Add all latitude
and longitude
variables
region_a.order_vars.append(var_list=['latitude','longitude'])
pprint(region_a.order_vars.wanted)
Example 2.2: specify beams/profiles and variable¶
Add latitude
for only profile_1
and profile_2
region_a.order_vars.remove(all=True)
pprint(region_a.order_vars.wanted)
var_dict = region_a.order_vars.append(beam_list=['profile_1','profile_2'], var_list=['latitude'])
pprint(region_a.order_vars.wanted)
Example 2.3: add/remove selected beams+variables¶
Add latitude
for profile_3
and remove it for profile_2
region_a.order_vars.append(beam_list=['profile_3'],var_list=['latitude'])
region_a.order_vars.remove(beam_list=['profile_2'], var_list=['latitude'])
pprint(region_a.order_vars.wanted)
Example 2.4: keyword_list
¶
Add latitude
for all profiles and with keyword low_rate
region_a.order_vars.append(var_list=['latitude'],keyword_list=['low_rate'])
pprint(region_a.order_vars.wanted)
Example 2.5: target a specific variable + path¶
Remove 'profile_1/high_rate/latitude'
(but keep 'profile_3/high_rate/latitude'
)
region_a.order_vars.remove(beam_list=['profile_1'], var_list=['latitude'], keyword_list=['high_rate'])
pprint(region_a.order_vars.wanted)
Example 2.6: add variables not specific to beams/profiles¶
Add rgt
under orbit_info
.
region_a.order_vars.append(keyword_list=['orbit_info'],var_list=['rgt'])
pprint(region_a.order_vars.wanted)
Example 2.7: add all variables+paths of a group¶
In addition to adding specific variables and paths, we can filter all variables with a specific keyword as well. Here, we add all variables under orbit_info
. Note that paths already in region_a.order_vars.wanted
, such as 'orbit_info/rgt'
, are not duplicated.
region_a.order_vars.append(keyword_list=['orbit_info'])
pprint(region_a.order_vars.wanted)
Example 2.8: add all possible values for variables+paths¶
Append all longitude
paths and all variables/paths with keyword high_rate
.
Similarly to what is shown in Example 4, if you submit only one append
call as region_a.order_vars.append(var_list=['longitude'], keyword_list=['high_rate'])
rather than the two append
calls shown below, you will only add the variable longitude
and only paths containing high_rate
, not ALL paths for longitude
and ANY variables with high_rate
in their path.
region_a.order_vars.append(var_list=['longitude'])
region_a.order_vars.append(keyword_list=['high_rate'])
pprint(region_a.order_vars.wanted)
Example 2.9: remove all variables+paths associated with a profile¶
Remove all paths for profile_1
and profile_3
region_a.order_vars.remove(beam_list=['profile_1','profile_3'])
pprint(region_a.order_vars.wanted)
Example 2.10: generate a default list for the rest of the tutorial¶
Generate a reasonable variable list prior to download
region_a.order_vars.remove(all=True)
region_a.order_vars.append(defaults=True)
pprint(region_a.order_vars.wanted)
Using your wanted variable list¶
Now that you have your wanted variables list, you need to use it within your icepyx object (Query
or Read
) will automatically use it.
With a Query
object¶
In order to have your wanted variable list included with your order, you must pass it as a keyword argument to the subsetparams()
attribute or the order_granules()
or download_granules()
(which calls order_granules
under the hood if you have not already placed your order) functions.
region_a.subsetparams(Coverage=region_a.order_vars.wanted)
Or, you can put the Coverage
parameter directly into order_granules
:
region_a.order_granules(Coverage=region_a.order_vars.wanted)
However, then you cannot view your subset parameters (region_a.subsetparams
) prior to submitting your order.
region_a.order_granules()# <-- you do not need to include the 'Coverage' kwarg to
# order if you have already included it in a call to subsetparams
region_a.download_granules('/home/jovyan/icepyx/dev-notebooks/vardata') # <-- you do not need to include the 'Coverage' kwarg to
# download if you have already submitted it with your order
With a Read
object¶
Calling the load()
method on your Read
object will automatically look for your wanted variable list and use it.
Please see the read-in example Jupyter Notebook for a complete example of this usage.
With a local filepath¶
One of the benefits of using a local filepath in variables is that it allows you to easily inspect the variables that are available in your file. Once you have a variable of interest from the avail
list, you could read that variable in with another library, such as xarray. The example below demonstrates this assuming an ATL06 ICESat-2 file.
filepath = '/full/path/to/my/ATL06_file.h5'
v = ipx.Variables(path=filepath)
v.avail()
# Browse paths and decide you need `gt1l/land_ice_segments/`
import xarray as xr
xr.open_dataset(filepath, group='gt1l/land_ice_segments/', engine='h5netcdf')
You’ll notice in this workflow you are limited to viewing data only within a particular group. Icepyx also provides functionality for merging variables within or even across files. See the read-in example Jupyter Notebook for more details about these features of icepyx.
Credits¶
- based on the subsetting notebook by: Jessica Scheick and Zheng Liu