Candidate Initiatives

These are initiatives that we are considering for future work. As we complete initiatives on our roadmap, we pull from this list for what to do next.

Help us fund candidate initiatives!

Interested in funding something? Fill out our funding interest form.

Extend usage quota system to include GPU time0/0Fund this 💰

As a hub administrator, I want my science users to be able to access GPUs when needed. I would like to enforce GPU compute quotas in order to confidently meet demand while safeguarding against runaway GPU compute costs.

Why?

Demand for GPUs is rapidly increasing unabated as scientists seek to exploit artificial intelligence to accelerate and advance their research
There is a worldwide shortage of supply and so GPU computing comes at a premium cost
GPUs are expensive, and I don’t want to waste them: Hourly rate of cheapest single GPU g4dn.xlarge instance ( $0.526) is over twice that of a comparable CPU r5.xlarge instance ($ 0.252) on AWS EC2

GitHub Initiative »

Provide fine grained, per-user cloud permissions (AWS)0/0Fund this 💰

As a JupyterHub user, I am a part of multiple different groups which grant me access to different cloud resources (like AWS S3 buckets). For example, I may be a part of 3 project teams, and we use an S3 bucket each to collaborate amongst ourselves. However, right now I can not seem to have access to per-group S3 buckets easily on the hub without bringing my own credentials - I can only access hub-wide S3 buckets. I would like to have credentials automatically set up so I can have secure access to just my own S3 buckets, without having to do much additional auth work

GitHub Initiative »

Provide a discoverable, hub-scoped catalog of interactive notebook tools0/0Fund this 💰

As a community lead, I want to publish a curated, browseable catalog of interactive tools my community has built so researchers can discover what is available, read about each one, and launch it with a single click.
As a researcher arriving at a community site, I want to see a list of available tools, read a short description and preview for each, and click “Launch” to start running one. I do not want to clone a repository, set up an environment, or know what “Binder” is.
As a JupyterHub admin, I want this catalog to live next to the community’s existing hub so that authenticated users land in the same identity context, and so that the compute generated by launching a tool is attributable in my cost monitoring.

GitHub Initiative »

When billing communities, account for MAUs or cloud usage from segments of users with group identity in the same hub0/0Fund this 💰

As an administrator that oversees many communities using 2i2c infrastructure, I often have multiple communities overlaid on the same hub. For example, I might administer EarthScope, with multiple communities paying me membership dues. I want each community to cover their own cloud costs and usage fees. I don’t want to create a new hub for every single community - I want them all on the same hub.

I do not have a way of doing this, because Monthly Active Users and cloud usage are both reported in a way that I cannot receive a different bill for each “group” of users. I can see their activity broken down by groups via the cloud costs dashboard, but I cannot break that into different line items on my 2i2c invoice.

GitHub Initiative »

Add a fast, session scoped, disk limited scratch space0/0Fund this 💰

As a user of the JupyterHub, I want access to faster storage where I can put temporary files. /tmp is not easily discoverable, as that’s not visible in the JupyterLab interface. I often end up using my $HOME directory for temporary files - this is very slow, clogs up my limits, and is frustrating. Additionally, if multiple users over-use /tmp at the same time on the same node, it may kill one of our servers - which is even more confusing.

As a JupyterHub admin, I want to educate users how to use ‘fast’ local storage for temporary needs, instead of putting it in their $HOME and costing me money. It would be useful to have an easily accessible storage space that’s size limited, and wiped automatically when their server stops.

GitHub Initiative »

Provide users with a simplified way to understand their historical resource asks and usage over time0/0Fund this 💰

As a hub user, I want to be a good citizen in choosing what resources to ask for, but I don’t have a lot of time to look through the choices. If my hub experience made it really obvious what I was actually using over time, it would be easier for me to make the right choices in the future.

This is a stub initiative - we are still working on a problem statement and solution.

GitHub Initiative »

Rework our Grafana charts so admins can better understand resource utilization from their users0/0Fund this 💰

As a hub admin, I want to understand my users’ resource utilization at a glance.

This is a stub initiative - we are still working on a problem statement and solution.

GitHub Initiative »

Provide users with real time information on their resource usage in JupyterLab0/0Fund this 💰

As a JupyterHub administrator, I want every user on my hub to be able to see the impacts of their resource requests and usage (both storage and computing) quickly, so they can limit their requests to the resources they really need.

This is a stub initiative - we are still working on a problem statement and solution.

GitHub Initiative »

Provide standard nomenclature for how to talk about cloud cost attribution on JupyterHub0/0Fund this 💰

As a community lead for JupyterHub implementation, I want to be able to lead clear conversations about cloud costs with multiple stakeholders. It’s important to me to have consistent, easy-to-understand data about my cloud costs.

This is a stub initiative - we are still working on a solution.

GitHub Initiative »

Implement flexible per-user home directory quotas via group membership0/0Fund this 💰

As a hub administrator I want to easily assign different storage quotas to users based on their group membership or sponsoring institution.

This is a stub initiative - we are still working on a solution.

GitHub Initiative »

Quickly share and publish notebooks rendered with MyST via a JupyterHub0/0Fund this 💰

As a data scientist, I often want to share my work so that others can review it, give feedback, etc. I want to do this with the fewest actions possible, because I may want to share work that is in-progress, incomplete, or just a prototype. I also might want to share this only with trusted team members instead of publishing it to the entire internet.

Right now, there’s no easy way for me to do this. GitHub links, nbviewer, notebooksharing.space, jupyterbook.pub, etc all require me to engage in an involved “git commit -> publish publicly -> go to UI -> get link” loop. They are all also public in one way or another, and I may want to keep my work private.

GitHub Initiative »

Reduce baseline cost of always-on infrastructure0/0Fund this 💰

As a hub admin, I want to pay as little in cloud costs as possible. While most of my cloud cost scale with usage, there’s some ‘fixed cost’ associated with keeping the infrastructure running and ready at all times. I’d like this to be as small as possible, particularly when I’m not serving that many users.

GitHub Initiative »

Build a way to spin up nodes for users before they need them0/0Fund this 💰

As a JupyterHub user, I sometimes end up needing to wait for 5-10 minutes for a server to start up, and I see some messages in my startup screen about ‘new node spinning up’. This feels random and out of my own control, and the inconsistency makes it hard for me to plan for how long before I can access my interactive environment. I would like my server start times to be more consistent.

As a JupyterHub admin, I would like to be able to say ‘always have X% headroom for my users’, to be able to better trade-off server startup time vs cloud cost. I want to be able to say ‘spin up new nodes when the existing pool is 80% full’ rather than when it’s 100% full - I believe this gets me a better balance than using much larger nodes

GitHub Initiative »

Provide more options for tuning server startup times vs cloud cost0/0Fund this 💰

As a JupyterHub user, I am frustrated that my JupyterHub servers often take many minutes to start. I’m used to services that start much faster, and I lose my line of thinking when I have to wait this long for my interactive environment to come online. I would like my servers to come online faster, especially as they are also culled after 1 hour of inactivity.

As a JupyterHub admin, I want to better understand the trade-off between cloud costs and server startup time, so I can make informed decisions about how to optimize server startup times. Lower cloud cost is great, but not if it means my users aren’t actually using my platform!

GitHub Initiative »

Enable or document how JupyterHub services can re-use authentication for accessing external services0/0Fund this 💰

As a cluster and hub administrator, I have a collection of private services running on a Kubernetes cluster, each of which requires tokens or authentication to access. I also have a JupyterHub accessible to the web and where my users log in (e.g., hub.me.org. This JupyterHub has gone through a long approval process to be public-facing.

If I want my other services to be accessible to my users, I need to go through the same long process for each one of them, and figure out how the authentication would work for each (e.g. service.me.org). As a result, it takes a very long time for me to experiment with deploying new services alongside my hub. It is also a hassle for my users, who don’t want to manage multiple login entry-points or authentication information.

GitHub Initiative »

Allow users to publish interactive content served by JupyterHub0/0Fund this 💰

As a data scientist, I often want a way to share my analyses and reports in a way that others can quickly interact with. I must share these with stakeholders that are non-technical, or that don’t have the time to fire up their own kernels. I want these reports to be very fast to access, and powered by the same environments that power my interactive kernels. I’d like them to be persistently available to view, and ideally persistently executable as well.

Currently, there is no way to easily share a viewable version of my analysis, much less an interactive one, on the same hub. I’d have to publish to github and then send a link to that, or use a third-party service like nbviewer, notebooksharing.space, jupyterbook.pub, etc.

In short, I want a fast way to:

Create a report that includes interactive outputs
Make that report publicly available via a URL that is served from the hub (requiring authenticated access)
Have that report powered by computation on the hub

GitHub Initiative »

JupyterLite for 2i2c Communities0/0Fund this 💰

As a instructor I want to share interactive content with learners but to do not need to keep or manage the storage for these users. I am prepared to accept potentially incompatible libraries and restrictions loading data from external websites in exchange for the benefit of not requiring any cloud managed compute. My learners have modest memory and compute needs to execute the content.

GitHub Initiative »

Performance tune and test JupyterHub for ~10k active concurrent users0/0Fund this 💰

As a JupyterHub admin, I want high levels of confidence that my infrastructure can handle large amounts of users (~10k active concurrent users with 100k total users). I would like to be able to test my configuration by simulating these many users, and tweaking my configuration until I’m satisfied with the overall performance of my infrastructure. This is particularly helpful for me when I’m trying to run a MOOC, as I expect many users to be accessing my platform simultaneously

GitHub Initiative »

Securely autograde notebooks from students built with otter-grader0/0Fund this 💰

As an instructor making educational content, I want to use a community supported way to do automatic grading of students’ notebooks. I want this to provide them instant feedback as they work through the content, as well as a secure way to automatically grade their end result and post the grades back to the LMS (Learning Management System) I use (like Canvas).

As a student, I want to have a simple button I can click in my interface to submit my notebook for grading, see the grading progress and have the score be submitted to my LMS where I can see it.

GitHub Initiative »

Support customizing user resources and environments when launching from an LMS (like Canvas)0/0Fund this 💰

As an instructor, I want to specify what environment (packages, etc) and resources (RAM, GPU, CPU, etc) my students launch into based on what course they are launching from, as well as their role (TA, student, instructor, etc). This allows students to have customized experiences that are specific to the exercise I want them to do, without them having to understand accidental complexities like images, resource selection, etc. It also allows me as an instructor or my TAs to have higher resource limits, so I can experiment with course content authoring more easily.

As a JupyterHub administrator, I want to restrict the resource profiles (RAM, CPU, GPU, etc) my instructors can have access to, as a way to control overall cloud spend.

GitHub Initiative »

Build a friendly interface to allow instructors to create nbgitpuller links from within an LMS (like Canvas)0/0Fund this 💰

As an instructor, I want to create assignments within my university’s LMS (like Canvas) that allow my users to land in interactive content (like notebooks I authored) as part of their learning experience. The most common way to distribute content on JupyterHub that I use is to put my content on a git repository (like GitHub), and use nbgitpuller to distribute the content to my users. In each repository, I may have different labs, homeworks and course sections that put my students into different notebooks. To create links that point to the correct content, I have to use the nbgitpuller link generator and understand how to use the ‘Launch from Canvas’ option works, and copy a long URL over. This is more complex than other tools I can use from within my LMS, and more error prone. I would like a simpler workflow.

GitHub Initiative »

Support recording discrete events analytics about user actions0/0Fund this 💰

As a JupyterHub admin for an educational institution, I want to have a record of various actions (such as starting of a server, opening a notebook, etc) performed by my students to help me better understand their educational performance. I would like these events to be ingestable into the existing analytics systems we have, in a well structured and documented format that we can tie in. Because I care about their privacy and the usefulness of my data, I don’t want to indiscriminately collect ‘everything’ - only an explicit set of things that I’m interested in and can disclose that I am collecting.

GitHub Initiative »

Build governance for a `jupyterhub-contrib` organization0/0Fund this 💰

As a maintainer of JupyterHub, I want to users of JupyterHub with a lot of ancillary projects (like authenticators, mixins, spawners, etc) that make their use of JupyterHub better. However, I don’t want to then sign up the (limited) JupyterHub maintainer team to maintain an indefinite array of new things. I want to find a balanced way to indicate to end users ‘hey, this project seems to follow good standards and has a decent chance of being maintained’ without taking on the full responsibility of actually maintaining these projects forever. I would like this way to also provide some social capital for projects, a marker to incentivize good governance & technical standards and attract multi-stakeholder maintainership.

GitHub Initiative »

Refactor how repositories are fetched in repo2docker and binderhub0/0Fund this 💰

As a user of various binderhub installations (both on mybinder.org and for dynamic image building in repos), I find it difficult to specify what repositories to fetch content from. I would like to just paste a URL of my repository, and have the software ‘figure it out’. Instead, I have to explicitly understand what kind of repository I’m trying to fetch, and enter that appropriately in the UI.

As a maintainer of repo2docker and binderhub, we have a lot of repeated code in both projects to support different repositories (like git, zenodo, etc). Adding support to a new repository provider requires PRs to both these projects, which implement things similarly but slightly differently. This leads to wasted effort, difficulty in landing new features (such as automatic repo detection 🟢 Awesome bar/landing page redesign), and also in increased maintainer load with contributors. For example, both mercurial and swhid were added to repo2docker by contributors, yet they never made it to mybinder.org because we did not have capacity to review the equivalent PRs to binderhub. I also constantly notice that new projects would benefit from this functionality to fetch repos (like binderlite - a binderhub for jupyterlite, or jupyterbook.pub - a binderhub for jupyterbook rendering) - but will have to reinvent this functionality. I’d like to refactor repo2docker and binderhub to solve this problem.

GitHub Initiative »

Generalise cost monitoring system configuration0/0Fund this 💰

As a hub admin, I want the a cost monitoring system that is flexible and easily configurable for my specific deployment scenario. For example:

I am running a cluster on GCP and want to set configs for #7 and #9
I have cloud costs managed by CloudBank and I want to customise resource tags that conform to CloudBank’s resource tagging schema

GitHub Initiative »

Support embedding interactive notebooks in LMSes (like Canvas)0/0Fund this 💰

As an instructor, I construct my courses in the LMS (like Canvas) my institution provides to me. As part of the course, I would like my students to be able to have access to an interactive notebook either embedded within the LMS, or available to them at the click of a button (with no other intermediate steps). This will provide a seamless experience for students whenever they need to work with interactive content, without having to use an entirely different set of authentication flow.

GitHub Initiative »

Reduce the number of conflict resolution failures when using nbgitpuller to distribute content0/0Fund this 💰

As a student, I am expected to click links that are provided to me (through my LMS, course website, slack or other medium) that will launch a jupyter notebook pre-populated with content related to my assignment or class. This mostly works fine, and preserves any changes I make in my content, and am happy! But in some rare cases, it does not work, and throws me a scary black error box with messages about git that I don’t really understand. Usually reaching out to my TAs can fix this, but it causes me stress and lost time.

As a TA, I often have to use the JupyterHub admin interface to run git commands to fix errors faced by some students when using nbgitpuller to distribute materials. I would very much rather spend my time on helping them learn, so more automatic ways to handle errors here would save me a lot of time.

GitHub Initiative »

Support pulling content from private non-git sources with nbgitpuller0/0Fund this 💰

As an instructor, I want to distribute content to my students who are working on a JupyterHub easily. Other instructors use nbgitpuller with git to do so, and generally have a favorable experience, particularly with respect to merging content. However, I don’t use git or github for anything, and I do not have time to learn and use it correctly for just this one purpose. It doesn’t fit with how I develop content. I would like to be able to use the same supported mechanisms, without having to learn to use git or github. I also don’t want my content to be public - I want it only to be accessible to the students who are part of the class. My students already have access to an authenticated place where they can get data from, and I want to use the same workflow to distribute my content.

GitHub Initiative »

Support pulling content from public non-git sources with nbgitpuller0/0Fund this 💰

As an instructor, I want to distribute content to my students who are working on a JupyterHub easily. Other instructors use nbgitpuller with git to do so, and generally have a favorable experience, particularly with respect to merging content. However, I don’t use git or github for anything, and I do not have time to learn and use it correctly for just this one purpose. It doesn’t fit with how I develop content. I would like to be able to use the same supported mechanisms, without having to learn to use git or github. I don’t have an issue with making my content publicly available - just not with git.

GitHub Initiative »

Educate users on using the dynamic image building feature to make image management easier0/0Fund this 💰

As a JupyterHub admin, I want to support my end users needing their own software environments. However, many of them don’t know how to handle docker and image management, and I don’t want to spend all my time managing environments for them. The “Build your own image” feature on 2i2c JupyterHubs solves this problems very well, but there isn’t enough documentation for end users for me to point to. I would love for a single location I can point them to that will guide them through using that feature, and why they can.

As a researcher, I want to have a consistent image with all the packages I need that will work over the course of my work, without being stuck with just the images my admin has available for me. However, I don’t know enough about docker to set that up, and I don’t want to. My JupyterHub admin told me I could use the ‘build your own image’ feature, which ‘works just like mybinder.org’. But I don’t know what any of that means. I would like to have a series of tutorials, and how-tos on how to set up and use images this way.

GitHub Initiative »

Allow admins to browse all user's home directories via a UI0/0Fund this 💰

As a JupyterHub admin, I sometimes need to manually perform operations on a user’s home directory. Some examples are:

They are no longer here, and we want to clean up their directory after sending them a copy to reduce our cloud spend
They have run up on their storage limit, and their server won’t start due to the image we use. I wanna go manually clean some files out so they can start their server

I want to be able to perform these rare operations without risk of accidentally destroying user data.

GitHub Initiative »

Allow sharing my work selectively with non-hub users0/0Fund this 💰

As an end user, I’ve prepared content on the hub that I want to showcase interactively to people (specific decision makers who aren’t day to day users of the hub, the broad public, just a collaborator from a different org) who don’t necessarily have access to the hub. I want them to be able to simply click a link I share and have the experience I want them to have. I want this to be ephemeral, so they can come back multiple times and have the same experience from start to end, rather than polluted by previous times they have clicked this link.

As a JupyterHub admin responsible for cloud spend, I don’t want to spend an uncontrolled amount of money for people who are not my core users to access compute. I’m ok with having a specific amount of resources set aside for my users to share work, as long as it’s controlled and not open to the world. I would also like to have reports on what is being shared this way so I can justify the cloud spend.

GitHub Initiative »

Allow users to create shared folders with access control via a UI0/0Fund this 💰

As a user, I want to collaborate with other users on my hub on specific projects, via access to a shared directory that the users I collaborate with can have access to. This lets me have a quicker and more convenient way than to push externally to git and have them pull it to share work.

As a student, I am working on a group project with a few other students. I want to work together in a shared project directory, so we can minimize git overhead (which we are not yet comfortable with) on the same hub.

GitHub Initiative »

Allow admins to configure 'Start Server' page (profile list) via web UI0/0Fund this 💰

As a JupyterHub admin, I want to have control over what software environments and resource allocation options are available to my users when they try to start a server. I currently can interact with 2i2c support or make PRs myself, but this is cumbersome and due to timezone differences, can sometimes take days. This makes experimenting really difficult for me, as I need to often have several back and forths before we can make changes. Plus it is hard for me to exactly know what the changes will show without them being applied by an engineer. So I have to be very conservative about what I can ask for, and I don’t know what all the options are.

GitHub Initiative »

Provide per-user and per-group cloud cost reporting on GCP0/0Fund this 💰

As a jupyterhub admin, I am responsible for paying the cloud cost incurred by my hub. The policies I set and the information I provide my users can drastically alter how much money I have to spend. To understand how to best serve my users while staying within my budget, I want to know how much cloud cost each user is roughly responsible for. This allows me to reach out to them if necessary, as well as make reports to whoever is giving me money on who is using that money for what.

Since my hub may serve many distinct groups of users, I also want to have reports of cloud spend by the groups a user belongs to, so I can talk to the people responsible for those groups directly if needed, as well as justify my budget as the cloud cost is spent in service of the goals and accomplishments of these users and groups.

By having this information, I am better able to both:

Nudge my users into better practices, through training and guidance
Draw a direct line from the achievements of my users using the hub to the cloud cost I spend on them

This feature is already available for hub admins on AWS, but since my hubs are on GCP, I would like this feature too.

GitHub Initiative »

Provide per-hub and per-component cloud cost reporting on GCP0/0Fund this 💰

As a hub admin, I want to understand how much each hub my community is running costs me in cloud cost, so I can better advocate for their ongoing funding. I also want to understand how much each component (compute, storage, etc) costs so I can make intelligently discuss usage with my funders and users, as well as make informed choices about quotas and resource allocations.

This is currently already possible on AWS, but not so on GCP. Since my hubs run on GCP, I would like to be able to use this feature as well.

GitHub Initiative »

Allow archiving user home directories based on usage policies0/0Fund this 💰

As a hub admin, I have many users who are no longer using the hub (because they graduated, finished their projects, moved on to other infrastructure, etc) but still cost me money because I am continuously storing their home directories and paying for it. I want to not pay for those inactive users anymore.

GitHub Initiative »

Allow users to read / write from object storage like a filesystem0/0Fund this 💰

As an end user on the cloud, I have to use object storage (such as S3) for storing and accessing intermediate and final data products. While I use cloud native methods to do most of the work, in some cases it is very helpful to be able to access cloud object storage as if it was a traditional filesystem:

When dealing with smaller intermediate and final data products produced by other systems (like an external job queue)
As a way to use existing data exploration tools (including the Jupyterlab file browser) that work best with traditional filesystems

GitHub Initiative »

2i2c Platform Roadmap

About the Roadmap

2i2c Platform Roadmap

Completed Platform Initiatives