2i2c communities at AGU 2024

We are proud to share that several of 2i2c’s community partners are presenting their work at AGU 2024! In each case, 2i2c’s infrastructure plays a part in helping communities create and share knowledge, and grow their community. As an organization rooted in community-centric practices, we are particularly excited to see 2i2c represented “indirectly” at this conference, and to see ourselves as a supporting role enabling the impact of others.

Here’s a summary and links to all of the sessions. See below for a brief overview of seach one.

ED31G-2272 Breaking down the barriers to Open Science with Project Pythia #

Link to session

Hall B-C (Poster Hall) (Convention Center)

Abstract #

Project Pythia is an open access educational initiative established with funding from the U.S. National Science Foundation. Its mission is to help students and scientists enhance their skills and adopt best practices using the tools and technologies of open science. As part of the Pangeo community, Project Pythia primarily focuses on the Pangeo stack, which includes cloud computing, Jupyter technologies, GitHub, and various software packages in the Scientific Python ecosystem, centered around Xarray. Project Pythia offers a wide range of open access content, such as datasets, software, tutorials, and annotated real-world workflows presented in the form of Jupyter Books.

Project Pythia serves as a resource for scientists, promoting and fostering open science. Although it is not a scientific research artifact itself, the development of Project Pythia adheres to many best practices advocated by open science proponents. The Pythia team actively encourages community engagement and collaborates openly with scientists and technologists to create new content. All Pythia resources are freely accessible, and the project follows the FAIR principles (Findable, Accessible, Interoperable, and Reusable) for managing research outputs, including publications, data, and other materials. We support and facilitate open evaluation and peer reviews of content to ensure verifiability and trust. Lastly, we endeavor to openly discuss ideas, designs, and methods before implementation.

This presentation will provide an overview of Project Pythia’s extensive educational resources and share our experiences in applying many open science principles to develop this flagship training resource for the geoscience community.

Authors #

  • John Clyne - NSF National Center for Atmospheric Research (first author)
  • Drew Camron - University Corporation for Atmospheric Research
  • Orhan Eroglu - NSF National Center for Atmospheric Research
  • Robert Ford - University at Albany State University of New York
  • Julia Kent - NSF National Center for Atmospheric Research
  • Ryan May - University Corporation for Atmospheric Research
  • James Munroe - 2i2c / Code for Science and Society
  • Brian E J Rose - SUNY at Albany

ED31G-2277 PACE Hackweek: An open community keeping up with PACE #

Link to session

Abstract #

The NASA Plankton, Aerosol, Cloud, ocean Ecosystem (PACE) mission, while bringing NASA’s Earth System Observatory up to speed with aquatic, atmospheric, and terrestrial science capabilities, is also providing data records of the Earth System for the next generation of scientists to grow into. The goal of the PACE Hackweek, supported by the Ocean Carbon & Biogeochemistry program and hosted at the University of Maryland Baltimore County in August 2024, was to enrich and support the practice of open science by both emerging and established researchers. Cloud-compute resources for the event were provided by CryoCloud, a NASA funded collaboration between ICESat-2 and the International Interactive Computing Collaboration (2i2c) to provide cryosphere researchers with a shared JupyterHub. We, the hackweek mentors, were buoyed by the NASA Openscapes program and adopted its mantra of striving toward “a kinder science for future us.” Participants faced two novelties: the “firehose” of data from the PACE instrument array (a hyper-spectral imaging spectrometer, a wide-swath hyper-angular polarimeter, and a narrow-swath spectro-polarimeter), and the distribution of PACE collections through the NASA Earthdata Cloud (a first for the Ocean Biology Distributed Active Archive Center). We present our approach and the challenges undertaken to hold an in-person, social coding event with 45 participants that provided a collaborative, supportive launchpad for doing open science with PACE. All lectures and tutorials produced for the event are freely available for examination and reuse. Our results additionally include highlights from the demonstration projects pursued by event participants and results from two post-event, qualitative surveys. One anonymous survey gathered participant feedback that will inform plans for growing these 45 participants into a lasting, open community. A separate, anonymous survey recorded participant demographics in order to evaluate our efforts at increasing diversity within the community of PACE data users. Key points of discussion include participant views, informed by our event, on whether and how the NASA Earthdata Cloud is a significant resource for the practice of open science with PACE, and how a shared JupyterHub can further the practice of open science by the community it serves.

Authors #

  • Ian Carroll - NASA Goddard Space Flight Center; University of Maryland Baltimore County (first author)
  • Kelsey Bisson - NASA Headquarters
  • Sean Foley - NASA Goddard Space Flight Center; Morgan State University
  • Patrick Clifton Gray - University of Maine
  • Elizabeth E Holmes - NOAA Fisheries
  • Carina Poulin - Science Systems and Applications, Inc.; NASA Goddard Space Flight Center
  • Tasha Snow - NASA Goddard Space Flight Center; University of Maryland College Park
  • Guoqing Wang - NASA Goddard Space Flight Center; Science Systems and Applications, Inc.
  • Jeremy Werdell - NASA Goddard Space Flight Center
  • Anna Windle - NASA Goddard Space Flight Center; Science Systems and Applications, Inc.
  • Pengwang Zhai - Department of Physics, University of Maryland Baltimore County

IN13A-2147 Including more solutions and more solvers via actionable open science #

Link to session

Monday, 9 December 2024 13:40 - 17:30
Hall B-C (Poster Hall) (Convention Center)

Abstract #

If we’re asking people to change for open science, we must be willing to change ourselves. Internalizing this as individuals and institutions is critical - “to address our climate emergency, we must rapidly, radically reshape society. We need every solution and every solver” (Johnson & Wilkinson, All We Can Save).

Radically reshaping our society and including more solvers requires Earth scientists of all disciplines, across AGU, to work together in new ways. Many of these shifts can be considered Open science. They change how we work daily, not just the open products we produce. And for that, people need to consider themselves part of a team, let go of perfection and embrace a growth mindset to continually reflect and improve skills – no matter their job title. Further, open science requires all of us to see ourselves as leaders making small changes that collectively add up to a movement.

Openscapes is an open source approach to cultivating leaders and change makers. Openscapes’ flywheel approach intervenes and builds momentum through identifying mentors within organizations and mentoring teams curious about shifting to open science (Robinson & Lowndes 2022). Collectively, the Openscapes flywheel iterations have had a significant impact over the past five years across institutions like the federal government and academia that seem impossible to change. Through stories working with professional scientists over the past 5 years including at NASA, NOAA, Black Women in Ecology, Evolution, and Marine Science, and many universities, and open source software communities like 2i2c, Posit, Pangeo, and RLadies, we will share actionable insights for flourishing in the open science commons, and are interested in learning with and growing flywheel momentum further at AGU.

Authors #

IN34A-01 Beyond Open Data: Ensuring True Accessibility for All (Invited) #

Link to session

Abstract #

The Earth Observation (EO) industry has seen rapid technological advancements alongside a massive increase in the number of private and public missions, leading to exponentially growing data archives. For publicly funded entities, this data is typically required to be freely available. However, open data does not always guarantee accessibility, and significant barriers remain for even the most advanced users. The stagnation in the use and adoption of open data can be attributed to several factors, including 1) challenges in unifying and maintaining metadata standards, 2) inefficiencies associated with legacy data formats, 3) a lack of training and resources for transitioning to cloud-based infrastructure, and 4) systemic social inequalities.

This talk will explore real-life examples of these barriers and highlight success stories that have emerged from partnerships largely originating within open-source communities which foster diverse connections between private and public entities including efforts like GeoZarr, pangeo-forge, Openscapes and 2i2c. While many advancements in improving the usability and accessibility of EO data have come from private efforts (i.e. Google Earth Engine), the shutdown of the Planetary Computer is a reminder of the need for publicly funded alternatives. The sustainability of open source projects will be addressed, with questions posed around reliable funding mechanisms as a means to ensure equitable development to address barriers and ensure accessibility for all. While this talk will be presented by one individual, it is the review and reflection of the work done by dozens of people across various organizations.

Authors #

  • Brianna Rita Pagán - NASA Goddard Space Flight Center; ADNET Systems Inc. Greenbelt (first author)

Introducing GeoLab - An EarthScope JupyterHub for Enabling Collaborative Cloud-Native Geophysical Data Analysis and Skill Development Workshops #

Link to session

Abstract #

The EarthScope Consortium manages NSF’s GAGE and SAGE facilities and makes all of its geophysical data available in a commercial cloud system. This enables EarthScope and the communities it supports to leverage the abundant computational resources and cost-effective benefits of adopting data-proximate workflows with direct access to large, analysis-ready geophysical data sets.

In recent years, JupyterHub environments have gained popularity with data enthusiasts for their ability to provide open access to powerful compute resources. As part of a broad effort to support communities with intuitive resources to quickly adapt their workflows to the cloud, EarthScope has partnered with 2i2c to operate a scalable JupyterHub environment in AWS that will provide equitable access to cloud compute resources for researchers, educators, and the general public. GeoLab, the EarthScope hub, is aligned with related open science initiatives to establish rigorous and transparent standards for reproducible, data-intensive workflows. In addition to promoting interdisciplinary and inter-institutional collaborative work between researchers in GeoLab, EarthScope is developing and hosting workshops that can support both in-person and asynchronous learning modules that will train users how to utilize these new resources and transition their work to the cloud. We are excited to invite all geophysical data users to participate in the vigorous growth of this new platform and collaborate with adjacent open-science compute hub initiatives.

Authors #

U13A-2349 Sharing recipes for cloud computing: the Project Pythia Cookbook Initiative #

Link to session

Abstract #

Project Pythia is the flagship education and training initiative of the Pangeo community. Pangeo has advanced transformative platforms and paradigms for “Big Data” geoscience in the cloud; Pythia is creating on-ramps for new users with open, interactive learning resources centered on Python in the geosciences. Pythia is now building a vibrant community-owned clearinghouse of accessible, reusable, and reproducible tutorials and exemplar workflows in the cloud known as Pythia Cookbooks.

“Cookbooks” imply collections of recipes for transforming raw ingredients (publicly available data) into scientifically useful results. Based on Jupyter notebooks, Cookbooks are explicitly tied to reproducible computational environments and supported by a rich cloud-based infrastructure enabling collaborative authoring and automated health-checking – essential tools in the struggle against the widespread notebook obsolescence problem. Cookbooks are hosted on Pythia’s searchable gallery and nurtured by a growing community of open science enthusiasts from across the geosciences. The Pythia Cookbook gallery is essentially a crowd-sourced, community-curated collection of best practices for data analysis and visualization.

Here we will outline the stack of technologies and infrastructure enabling cookbook creation, collaboration, testing, publication, and interactive deployment, and how these are used in service of building an inclusive participatory community. We will discuss existing technical and social hurdles for contributors, as well as new infrastructure developments in collaboration with the Executable Books Project that are reducing these hurdles.

Authors #

U13A-2350 Supporting NASA Earthdata users in the Cloud: NASA Openscapes JupyterHub and User Onboarding & Fledging #

Link to session

Abstract #

In this talk we will highlight our NASA Openscapes community teaching approach to using the 2i2c-managed JupyterHub – how we’ve collaboratively developed it to meet user needs, and how it continues to enable researchers and users of NASA Earthdata in the new Cloud paradigm.

NASA Openscapes is an open source mentor community across NASA Earth science data centers ( DAACs) that helps users explore and use the Cloud for their science and applications. Earthdata Cloud Cookbook is a learner-focused open source tutorial collection that we update openly as we learn together.

A critical piece of the NASA Openscapes effort is our NASA Openscapes 2i2c JupyterHub, a managed cloud computing space. By working with cloud early adopters and science Champions, responding and co-developing solutions, the JupyterHub has evolved since its early days in 2021. We support cloud computing for several languages (python, R, Matlab, QGIS) and common science libraries with corn; we streamlined how to bulk-add workshop participants via GitHub Teams; we established policy and technology for a special authentication mechanism for large scale workshops; we are developing earthaccess as an community-developed python library for NASA Earthdata search and access, whether locally or in the cloud.

Based on the last 3 years of engaging with the user community and the Hub, we have evolved how we onboard (first experience in the cloud) and fledge (set up for Cloud that includes a plan, how to do it, how to pay for it; leaving the nest and perhaps building your own). Fledging is an important part of adoption and initiating users to the Cloud - where do researchers go when they decide to do their science in the Cloud? We’ve been developing practices that aim to be equitable and consider policy (cost), technical (where do people go, what admin setup is needed, what tech like base images etc are needed), and social (how do I learn, get support) aspects, and look forward to discussing further at AGU.

Authors #

V31A-08 VICTOR – A new Cyber-infrastructure for Volcanology #

Link to session

Abstract #

Numerical models are essential for forecasting volcanic hazards for both short-term responses and long-term hazard assessment. While many models of volcanic processes already exist, challenges in finding, installing, and evaluating these models, coupled with limited computational resources, hinder their widespread use. To address this, we introduce VICTOR, the Volcanology Infrastructure for Computational Tools and Resources.

VICTOR is a cutting-edge cyber-infrastructure platform offering an open-source, cloud-based environment tailored for the volcanology community. It features Jupyter notebooks that integrate existing volcano models, such as the lava flow codes MOLASSES and IMEX_lava, the tephra and ash dispersal codes Tephra2 and HYSPLIT, and the mass flow code TITAN2D. The backend of VICTOR is managed as a JupyterHub, operated by the non-profit 2i2c under the Code for Science and Society.

VICTOR not only provides access to individual modeling tools, but also hosts workflows that use them in data inversion, model benchmarking, and uncertainty quantification. For example, we developed a workflow to validate mass flow models using multiple metrics and Bayesian statistics. VICTOR provides built-in access to external databases such as OpenTopography, Copernicus, and NASA’s remote sensing products to streamline obtaining and using data in workflows.

VICTOR also serves as an educational resource. In Spring 2023 and 2024 we taught graduate level, multi-institutional courses in Computational Volcanology using VICTOR, and we are creating multilingual tutorials for the workflows. We are developing teaching modules on topics such as lava flows and remote sensing to be shared with instructors. Lastly, VICTOR collaborates with national efforts including CONVERSE and SZ4D.

In summary, VICTOR addresses the critical need for accessible, effective volcanic hazard modeling tools and resources, fostering advancements in both research and education within the volcanology community.

Plain-language Summary #

VICTOR is a new online platform designed to help scientists predict volcanic hazards more easily. Traditional models can be difficult to find, use, and combine with other tools. VICTOR solves these problems by offering a cloud-based, open-source environment with user-friendly tools.

VICTOR includes tools like Jupyter notebooks that combine various volcano models for lava flows, ash dispersal, and mass movements. It operates through JupyterHub, managed by the non-profit 2i2c. The platform not only provides access to these models but also offers workflows for tasks like data analysis and model validation. For example, it has a workflow for testing mass flow models using multiple evaluation methods.

VICTOR simplifies data access by connecting directly to databases like OpenTopography and NASA’s remote sensing products. It’s also an educational tool, used in graduate courses and offering multilingual tutorials. Additionally, VICTOR is developing teaching materials on topics like lava flows and remote sensing and collaborates with national projects like CONVERSE and SZ4D.

In essence, VICTOR makes volcanic hazard modeling more accessible and effective, benefiting both research and education in volcanology.

Authors #

Chris Holdgraf
Chris Holdgraf
Executive Director / Co-lead of Business Development