Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Sustainable first class Canvas Authentication & Authorization support

Desired outcomes

Context

Canvas

Canvas is a popular open source LMS that is favored by large universities. It has an extensive REST API that provides a lot of useful functionality. In particular, it provides standardized OAuth2 that can be used for authentication (establishing identity of the user) and courses & groups for authorization (differential access based on membership).

JupyterHub and Canvas

JupyterHub supports authentication and authorization with OAuth2 providers using OAuthenticator. Over the last few years, we (2i2c) have helped contribute features to the GenericOAuthenticator which allows for integration with all kinds of OAuth2 providers, rather than writing a specific authenticator for each provider (like GitHub, etc). This allows for long term maintainability (Outcome 3) as well as faster deploys without having to write new code for each provider.

While this works in very straightforward ways for authentication, it doesn’t quite for authorization. There are no real standards at the OAuth2 level for fetching group memberships, as each provider has a different idea of what groups mean based on what kind of application it is. For example, in GitHub we sync with orgs and teams, while with Canvas we want courses and groups. This is essential complexity of the problem, and we want to find elegant ways to solve it while keeping Outcome 3 in mind.

Prior art

Prior art (which current 2i2c members were heavily involved in) we should learn from are:

  1. PR to add a specific Canvas Authenticator. This was rejected due to its repetitive nature, and the need to add maintain a lot of Canvas specific code in the upstream JupyterHub project. This actively led to work in making the Generic OAuthenticator more capable.

  2. UC Berkeley’s CanvasOAuthenticator, a separately maintained version of the previously mentioned PR. It has been successfully used for thousands of students for 4+ years now, and contains lessons we can learn. In particular, we want to figure out how we can provide all this functionality without having to maintain an external Authenticator, so it can be sustainably upstreamed (Outcome 3).

Deliverables

Deliverable 1: Canvas Authentication with GenericOAuthenticator

Overview

GenericOAuthenticator already has enough functionality to provide authentication only with Canvas. We will set the staging hub with Canvas Authentiction to make sure this works, as well as test our processes, as this requires provisioning keys for our use from Canvas via University IT departments.

Definition of Done

Risk Factors

Demo at the end of this deliverable

A URL to a hub with working Canvas authentication enabled, that anyone with access to that Canvas instance can use to log in.

Estimates

Task

Lower Estimate

Upper Estimate

Provision OAuth2 credentials from University IT

2h

4h

Setup one staging hub with these credentials & document it

3h

6h

Investigate user identifiers & write a migration plan for home directories

4h

6h

Migrate all the staging hubs (3 total) & verify they work

3h

4h

Total

12h

20h

(2025-10-28: Totals were updated by colliand following a request from Harneer Batra.)

Who works on this?

Deliverable 2: Migrate all production hubs to using Canvas authentication

Overview

Once we are comfortable with Deliverable 1, we roll this out carefully to all the other hubs. Primary care must be taken here to make sure home directories work appropriately.

Definition of done

Risk factors

Demo at the end of this deliverable

All production hubs have users logging in via Canvas

Estimates

Task

Lower Estimate

Upper Estimate

Verify & document how resetting hub session cookie affects running users

1h

2h

Make a migration plan with timelines agreed upon by 2i2c & the University

2h

3h

Migrate highmem hub

4h

8h

Migrate r hub

4h

8h

Migrate main hub

4h

8h

Map exiting home directories names to new names (if user identifiers are different, as determined in migration plan in Deliverable 1)*

6h

8h

Watch for and address any support issues for a week

4h

4h

Total

25h

41h

Who works on this?

Deliverable 3: Build jupyterhub_oauthenticator_authz_helpers

Overview

Since this contribution we made to OAuthenticator, the general pattern for bringing groups into JupyterHub from an external source is:

  1. Talk to the external API to fetch groups information during login and figure out what groups the user belongs to

  2. Put that in the user’s auth_state

  3. Use auth_state_groups_key to pick that out as groups information

This allows easy separation of concerns - auth_state can securely contain many different pieces of info about the user, and auth_state_groups_key can be used to determine groups. If in the future we want to refresh group information more regularly, that can be done by simply refereshing auth_state.

This pattern also matches how this is done in OAuthenticator itself for providers it directly supports (see GitHub for example).

While we could simply write do (1) in our config, this is not scalable nor upstreamable (Outcome 3). Instead, we want to create a new python package, jupyterhub_oauthenticator_authz_helpers that contains helpful utilities for fetching groups info from various OAuth2 providers. This allows anyone to compose various info they want to get into (1) without having to copy paste python code into YAML everywhere.

There’s prior art that can we can use in accordance with the license + with the blessing of the people who wrote them.

Definition of done

Risk factors

Demo at the end of this deliverable

Estimates

Task

Lower Estimate

Upper Estimate

Setup the python project

1h

2h

Set up local Canvas environment for testing

4h

8h

Build helper function for fetching Canvas course enrollments into auth_state

4h

10h

Build helper function for fetching Canvas group membership into auth_state

4h

10h

Implement an additional, non Canvas auth_state helper to ensure the design is not tied to Canvas

6h

8h

Build scaffolding so admins can compose various helper functions to pick up authorization info into auth_state

8h

16h

Add package to the hub image, and test on a staging hub

8h

16h

Configure staging hub to make enrollments into jupyterhub groups

4h

6h

Test restricting users based on courses they are in works (and fix bugs if it isn’t)

4h

8h

Test that jupyterhub-groups-exporter picks these up, so grafana reporting shows groups

4h

6h

Test that we can show different profile options to users based on group membership

2h

4h

Write a blog post announcing this work (and credit everyone)

2h

4h

Total

51h

98h

Who works on this?

Deliverable 4: Upstream governance work towards setting up jupyterhub-contrib

Overview

As part of both Outcome 3 and our Right to Replicate, we want to ensure that code we write is upstreamed as much as possible. This requires governance of our code to be multi stakeholder, which allows for a large community of users to pitch in towards long term maintenance. Historically, this has meant upstreaming projects that have a wide user base into the JupyterHub organization itself. However, as JupyterHub has matured and grown, this is not necessarily viable - the number of projects with a wide audience is much larger than what the JupyterHub core team can maintain. While keeping projects under the 2i2c-org organization is temporarily ok, that is not as good a long term space as building a proper multi-stakeholder space where such projects can exist.

There is ongoing governance work in the JupyterHub ecosystem that 2i2c folks are involved in towards setting up a jupyterhub-contrib space that is exactly that. Given how broadly used Canvas is, and the desire to continue using it without having to be the sole maintainers of it, jupyterhub_oauthenticator_authz_helpers would make a fantastic addition to such a space once it exists.

As part of this project, to contribute towards Outcome 3, we would like to spend some hours doing the governance and community work required to set this project space up. It may not be fully complete as part of this particular statement of work, but by collectively putting hours towards it via various statements of work with different communities, we are able to provide value to all of them without any one of them having to bear the whole cost.

Definition of Done

Risk factors

Demo at the end of this deliverable

Either a charter document about the existence of jupyterhub-contrib, or a report on how we have spent the 20h and what the current status is.

Estimates

Who will work on this

Deliverable 5: Deploy jupyterhub_oauthenticator_authz_helpers to production hubs

Overview

At the end of Deliverable 4, we would have deployed Canvas groups functionality to a staging hub. Similar to Deliverable 2, we will now roll this out carefully to all the production hubs.

Definition of done

Risk Factors

Demo at the end of this deliverable

Same as Deliverable 3 but for all production hubs.

Estimates

Task

Lower Estimate

Upper Estimate

Migrate highmem hub, potentially restrict it to specific sets of users

4h

6h

Migrate r hub

2h

4h

Migrate main hub

2h

4h

Watch for and address any support issues for a week

4h

4h

Total

12h

18h

Who works on this?