Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Improved Error Handling in nbgitpuller

Background

nbgitpuller is a Jupyter Server extension that exposes a mechanism for synchronising remote content with the server’s local file-system. In the wild, its primary application lies in connecting JupyterHub users with hub-adjacent content through a simple distributable, user-friendly interface (URLs). By virtue of pulling remote content within an individual user’s server, it is often used to facilitate the separation of content from compute-environment definitions in contexts like JupyterHub and BinderHub.

There are two distinct personas that use nbgitpuller:

Link-author
People creating content that can be shared via an nbgitpuller link.
Link-consumer
People that use an nbgitpuller link to access shared content.

Between fetching remote content and merging conflicts with local edits, there are many ways in which nbgitpuller users can encounter errors during normal operation. Fixing these errors is neither the responsibility of link-author nor link-consumers. Instead, there is a third persona:

nbgitpuller expert
People with the technical expertise to debug problems encountered during nbgitpuller usage. It is an established nbgitpuller devlopment goal that this role dissapears in the future.

The existing UX for error handling confuses the persona of ngitpuller-experts with that of the link-consumer and link-author personas. As such, it leaves room for improvement, such as through the addition of error recovery mechanisms, or designing error responses that consider the needs of the link-consumer and link-author personas in addition to nbgitpuller-expert.

Technical details

nbgitpuller operates as a Jupyter Server extension that exposes a number of request handlers:

The UI served at /git-pull/ communicates with the API backend from the front-end using server-sent-events.

When used alongside a JupyterHub, there is a strong separation of concerns between provisioning of the compute environment (JupyterHub and e.g. KubeSpawner) and provisioning of the file-system (nbgitpuller). Using the /hub/user-redirect/ endpoint, content authors can craft user-agnostic URLs that invoke the nbgitpuller service.

The nbgitpuller URL handler (e.g. GET /git-pull?repo=...) implements several operations to fulfil a request:

  1. Remote content is fetched from a Git repository scoped to a specific branch (fetch).

  2. Fetched content is merged with the local file-system, resolving any conflicts in an opinionated manner to minimise user-input (merge).

  3. Redirect user to given URL path once (1) and (2) have been completed (open).

Deliverables

Add access to a Jupyter frontend following an nbgitpuller error

Overview

For some users, nbgitpuller links are the only way that they are familiar with to access a deployed JupyterHub. At present, when such a user encounters an error after following an nbgitpuller link, e.g. because the link is malformed, they find themselves without any navigation links or buttons that will take them to the “preferred”[1] frontend e.g. JupyterLab. These users need a way to access the preferred frontend application without modifying the URL bar of their browser or otherwise navigating to the JupyterHub by themselves.

We will extend the existing error response of the nbgitpuller web UI to provide a means of accessing the preferred frontend, such as through the addition of a clickable link or button. An automatic redirect should not be used, as it hinders the ability of the link user to capture debugging information for the link author when the link fails.

Definition of done

Estimates

Task

Lower Estimate

Upper Estimate

Build routine to identify “preferred” UI application

2h

3h

Design and implement UI

1h

3h

Open pull-request and shepherd through to merge

2h

4h

Additional learning and refinement

1h

3h

Total

6h

13h

Overview

The existing error-handling response for nbgitpuller is a thin abstraction which exposes many of the error details to the user. In practice, many users may not be familiar with Git, and/or may have limited ability to interpret the error messages. When designing for the link-consumer persona, we should prioritise simple, readable error messages that provide sufficient scope for the link-author (e.g. Teaching Assistants, Lecturers). Crucially, it should be possible for the majority of nbgitpuller errors to be understood without the use of the existing console window to read error log output. A convenient way to share the log outputs should be added, such as a Copy to clipboard button.

Fundamental changes to the technology stack, such as introducing a new UI framework, are NOT in scope.

Definition of done

Estimates

Task

Lower Estimate

Upper Estimate

Design and implement UI

5h

8h

Open pull-request and shepherd through to merge

2h

4h

Additional learning and refinement

1h

3h

Total

8h

15h

Identify common nbgitpuller errors

Overview

Within the space of possible errors that can occur during typical usage of nbgitpuller, there are several common classes, such as invalid links, renamed / deleted files, etc. Through inspection of logs from existing (large) nbgitpuller deployments, we will determine which nbgitpuller invocations failed, and the mechanism by which they failed (normalised by nbgitpuller URL). By analysing the resulting set of events, we will identify the most frequent failure modes normalised by link.

Definition of done

Estimates

Task

Lower Estimate

Upper Estimate

Liaise with appropriate personas associated with existing JupyterHub deployments

4h

11h

Generate structured events from raw logs

3h

7h

Analyse nbgitpuller events to identify common error types

2h

4h

Open pull-request and shepherd through to merge

2h

4h

Additional learning and refinement

1h

3h

Total

12h

29h

Design and integrate dedicated error handlers

Overview

For the set of common error classes identified in the previous deliverable, we will design up to three bespoke responses that clearly articulate what went wrong to link users that encounter each error. Although it will be the responsibility of the link author to resolve these problems, improving the error message will help guide the user to useful documentation and/or provide more context for the link author when the error is reported.

The primary objective of this deliverable is to reduce the requirement for link authors to draw conclusions from inline console tracebacks. The approach taken in this work should naturally extend to alternative content providers, should they be added in future.

Once each error class has a dedicated response, nbgitpuller will be extended to return these responses when it identifies a particular error class has been encountered.

Definition of done

Estimates

Task

Lower Estimate

Upper Estimate

Build error-handling routines to process and identify common failure modes

3h

7h

Design and implement UI

7h

12h

Update nbgitpuller documentation

1h

2h

Open pull-request and shepherd through to merge

2h

4h

Additional learning and refinement

1h

3h

Total

14h

28h

Additional overheads

In addition to per-deliverable work, there is up-front work that may be paid by each developer:

Task

Lower Estimate

Upper Estimate

Become familiar with nbgitpuller architecture

2h

4h

Set up development environment

1h

2h

Total

3h

6h

We will assume that two separate developers incur this cost.

Relevant GitHub Issues

Listed below are pertinent GitHub Issues open in the jupyerhub/nbgitpuller repository:

People working on this

This project would require capacity from:

  1. App Engineer (1 implementation, 1 review)

Timeline

Footnotes
  1. Where “preferred” refers to either the pre-determined singleuser endpoint, or the application indicated in the urlPath query.