Background¶
nbgitpuller is a Jupyter Server extension that exposes a mechanism for synchronising remote content with the server’s local file-system. In the wild, its primary application lies in connecting JupyterHub users with hub-adjacent content through a simple distributable, user-friendly interface (URLs). By virtue of pulling remote content within an individual user’s server, it is often used to facilitate the separation of content from compute-environment definitions in contexts like JupyterHub and BinderHub, where rebuilds of the single-user environment are costly and slow.
There are two main personas that use nbgitpuller:
- Link-author
- People creating content that can be shared via an
nbgitpullerlink. - Link-consumer
- People that use an
nbgitpullerlink to access shared content.
Between fetching remote content and merging conflicts with local edits, there are many ways in which nbgitpuller users can encounter errors during normal operation. Fixing these errors is neither the responsibility of link-author nor link-consumers. Instead, there is a third persona:
nbgitpullerexpert- People with the technical expertise to debug problems encountered during
nbgitpullerusage.
Every problem that requires the intervention of an nbgitpuller expert introduces a dependency upon the availability of the expert, limiting the scalability of nbgitpuller. Reducing the necessity of this role, e.g. by improving conflict resolution, represents a desirable goal for the project.
Technical details¶
nbgitpuller operates as a Jupyter Server extension that exposes a number of request handlers:
GET /git-pull/api— an API service endpointGET /git-pull/— a user-facing UI for triggering and following agitpull operation.
The UI served at /git-pull/ communicates with the API backend from the front-end using server-sent-events.
When used alongside a JupyterHub, there is a strong separation of concerns between provisioning of the compute environment (JupyterHub and e.g. KubeSpawner) and provisioning of the file-system (nbgitpuller). Using the /hub/user-redirect/ endpoint, content authors can craft user-agnostic URLs that invoke the nbgitpuller service.
The nbgitpuller URL handler (e.g. GET /git-pull?repo=...) implements several operations to fulfil a request:
Remote content is fetched from a Git repository scoped to a specific branch (
fetch).Fetched content is merged with the local file-system, resolving any conflicts in an opinionated manner to minimise user-input (
merge).Redirect user to given URL path once (1) and (2) have been completed (
open).
Deliverables¶
Identify common nbgitpuller merge errors¶
Overview¶
After fetching content from a content-source, nbgitpuller is responsible for unifying the remote content with the local user’s filesystem (see (2) above). Where the link-consumer and link-author have each made edits to a remote file, it may be possible to account for both sets of changes in a lossless merge operation. However, there are some situations in which it is not possible to merge both the remote and local changes in a conflict-free manner. On these ocasions, nbgitpuller should resolve conflicts by preferring the remote content, whilst also preserving the link-consumer’s edits.
The nature of these kinds of failures means that they’re often content-dependent, and there are anecdotal reports of nbgitpuller failing to properly resolve merge conflicts in the wild. These kinds of failures are difficult for link-author and link-consumer personas to resolve; often this requires intervention from the nbgitpuller-persona. Through inspection of logs from existing (large) nbgitpuller deployments, we will learn more about these kinds of failures in real-world deployments. .
Definition of done¶
An array of structured
nbgitpullerevents has been generated from existing large JupyterHub deployments logs.A set of common error types has been established from analysis of
nbgitpullerevent information.An array of structured
nbgitpullerevents has been generated from existing large JupyterHub deployments logs.A set of common error types has been established from analysis of
nbgitpullerevent information.A set of reproducible merge failures has been identified.
Estimates¶
Task | Lower Estimate | Upper Estimate |
|---|---|---|
Generate structured events from raw logs | 6h | 10h |
Analyse nbgitpuller events to identify common error types | 4h | 8h |
Open pull-request and shepherd through to merge | 4h | 8h |
Additional learning and refinement | 2h | 6h |
Total | 16h | 32h |
Implement fixes to Git-based merge routines¶
Overview¶
Following the work in the first deliverable, a set of reproducible merge failures will have been identified. Subsequently, work may be done to reduce the likelihood of these kinds of failures in order; by hardening nbgitpuller against failure during nominal usage, it may be possible to eliminate and/or diminish in importance the nbgitpuller-expert persona.
Alongside implementing fixes for these newly identified merge-failure scenarios, work should be done to embed reproducible test-cases in the nbgitpuller test suite.
Definiton of done¶
A set of reproducible merge failures has been prepared as a test-suite.
The extended
nbgitpullertest suite passes.
Estimates¶
Task | Lower Estimate | Upper Estimate |
|---|---|---|
Create reproducible tests for existing merge-failures | 8h | 12h |
Implement fixes for these test failures | 12h | 20h |
Open pull-request and shepherd through to merge | 4h | 8h |
Additional learning and refinement | 2h | 6h |
Total | 26h | 46h |
Additional overheads¶
In addition to per-deliverable work, there is up-front work that may be paid by each developer:
Task | Lower Estimate | Upper Estimate |
|---|---|---|
Become familiar with nbgitpuller architecture | 4h | 8h |
Set up development environment | 2h | 3h |
Total | 6h | 11h |
We will assume that two separate developers incur this cost.
Intentionally out of scope¶
For this statement of work, we are leaving the following as intentionally out of scope:
Use of alternative conflict resolution mechanisms besides Git.
Relevant GitHub Issues and external links¶
Listed below are pertinent GitHub Issues open in the jupyerhub/nbgitpuller repository, and other external resources:
https://
nbgitpuller .readthedocs .io /en /latest /topic /automatic -merging .html https://
curriculum -guide .datahub .berkeley .edu /support /troubleshooting /nbgitpuller
People working on this¶
This project would require capacity from:
App Engineer (1 implementation, 1 review)