Public views of the datasets published by
2i2c-org/data.
Each dataset is published as a GitHub Release; this site renders it so
community members can read it without leaving the browser.
Datasets¶
Cloud - monthly active users on each 2i2c cluster and hub.
How to find raw data¶
This site is built with MyST, and raw data for
every plot is downloadable from the page that shows it. File names
include a content hash (e.g. cloudbank-unique-[hash].csv), so the
exact URL changes whenever the underlying data does. This means you
can’t hard-code the link.
To resolve the current link, parse the page’s MyST AST. Every rendered
page has a .json sibling at the same URL (e.g. cloud.cloudbank.json
for /cloud/cloudbank/). Each static-file link has a urlSource
property (the pre-hash filename) and a url property (the hashed path
actually served). To find a download link, walk the AST, match on urlSource, and return url. Here’s a little example of a recursive function that does this in Python:
import json
from urllib.request import urlopen
PAGE = "https://2i2c-org.github.io/data/cloud.cloudbank.json"
WANTED = "cloudbank-unique.csv"
def find_url(node, source):
if isinstance(node, dict):
# If we find the source, return its URL
if node.get("urlSource") == source:
return node["url"]
# Iterate through dictionary values otherwise
for v in node.values():
found = find_url(v, source)
if found:
return found
elif isinstance(node, list):
# Iterate through list items if it's a list
for item in node:
found = find_url(item, source)
if found:
return found
ast = json.load(urlopen(PAGE))
print(find_url(ast, WANTED))
# /build/cloudbank-unique-<hash>.csvPrefix with the site origin (e.g. https://2i2c.org/data) to
get the full download URL.