This tutorial is part of a project which focuses on leveraging the vast amount of Earth science data available through the NASA Earthdata Cloud to better understand and forecast environmental risks such as wildfire, drought, and floods. At its core, this project embodies the principles of open science, aiming to make data, methods, and findings accessible to all. We aim to equip learners with the skills to analyze, visualize, and report on data related to these critical environmental risks through open science-based workflows and the use of cloud-based data computing.
What is Open Science¶
“Open Science is the principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity.”
Availability of Open Science Resources:¶
- Many existing open science resources, over 100 Petabytes of openly available NASA data.
- Tools and practices for collaboration and code development.
Outputs and Project Openness:¶
- Choice between openness from project inception or at publication.
- Making data, code, and results open.
Importance of Sharing and Impact:¶
- Enhances the discoverability and accessibility of scientific processes and outputs.
- Open methods enhance reproducibility.
- Transparency and verifiability enhance accuracy.
- Scrutiny of analytic decisions promotes trust.
- Accessible data and collective efforts accelerate discoveries.
- Open science fosters inclusion, diversity, equity, and accessibility (IDEA).
- And much more..
Why now¶
The internet offers numerous platforms for public hosting and free access to research and data. These platforms, coupled with advancements in computational power, empower individuals to engage in sophisticated data analysis. This connectivity facilitates the integration of participants, stakeholders, and outcomes of open science initiatives online.
Science and science communication confront growing resistance from the public due to concerns about result reproducibility and the proliferation of misinformation. Open science practices address these challenges by leveraging community feedback to validate results more rigorously and by making findings readily accessible to the public, countering misinformation.
Scientific rigor and accuracy are bolstered when researchers validate their peers’ findings. However, the lack of access to original data and code in scientific articles delays this process.
Where to start: Open Research Products¶
Scientific knowledge, or research products, take the form of:
What is data?¶
Scientifically or technically relevant information that can be stored digitally and accessed electronically such as:
- Information produced by missions and experiments, including calibrations, coefficients, and documentation.
- Information needed to validate scientific conclusions of peer-reviewed publications.
- Metadata.
What is code?¶
- General Purpose Software – Software produced for widespread use, not specialized scientific purposes. This encompasses both commercial software and open-source software.
- Operational and Infrastructure Software – Software used by data centers and large information technology facilities to provide data services.
- Libraries – No creative process is truly complete until it manifests a tangible reality. Whether your idea is an action or a physical creation, bringing it to life will likely involve the hard work of iteration, testing, and refinement.
- Modeling and Simulation Software – Software that either implements solutions to mathematical equations given input data and boundary conditions, or infers models from data.
- Analysis Software – Software developed to manipulate measurements or model results to visualize or gain understanding.
- Single-use Software – Software written for use in unique instances, such as making a plot for a paper, or manipulating data in a specific way.
What are results?¶
Results capture the different research outputs of the scientific process. Publications are the most common type of results, but this can include a number of other types of products:
- Peer-reviewed publications
- Computational notebooks
- Blog posts
- Videos and podcasts
- Social media posts
- Conference abstracts and presentations
- Forum discussions
Products are created throughout the scientific process that are needed to enable others to reproduce the findings. The products of research include data, code, analysis pipelines, papers, and more!