Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Improvements to Conflict Resolution in nbgitpuller

Background

nbgitpuller is a Jupyter Server extension that exposes a mechanism for synchronising remote content with the server’s local file-system. In the wild, its primary application lies in connecting JupyterHub users with hub-adjacent content through a simple distributable, user-friendly interface (URLs). By virtue of pulling remote content within an individual user’s server, it is often used to facilitate the separation of content from compute-environment definitions in contexts like JupyterHub and BinderHub, where rebuilds of the single-user environment are costly and slow.

There are two main personas that use nbgitpuller:

Link-author
People creating content that can be shared via an nbgitpuller link.
Link-consumer
People that use an nbgitpuller link to access shared content.

Between fetching remote content and merging conflicts with local edits, there are many ways in which nbgitpuller users can encounter errors during normal operation. Fixing these errors is neither the responsibility of link-author nor link-consumers. Instead, there is a third persona:

nbgitpuller expert
People with the technical expertise to debug problems encountered during nbgitpuller usage.

Every problem that requires the intervention of an nbgitpuller expert introduces a dependency upon the availability of the expert, limiting the scalability of nbgitpuller. Reducing the necessity of this role, e.g. by improving conflict resolution, represents a desirable goal for the project.

Technical details

nbgitpuller operates as a Jupyter Server extension that exposes a number of request handlers:

The UI served at /git-pull/ communicates with the API backend from the front-end using server-sent-events.

When used alongside a JupyterHub, there is a strong separation of concerns between provisioning of the compute environment (JupyterHub and e.g. KubeSpawner) and provisioning of the file-system (nbgitpuller). Using the /hub/user-redirect/ endpoint, content authors can craft user-agnostic URLs that invoke the nbgitpuller service.

The nbgitpuller URL handler (e.g. GET /git-pull?repo=...) implements several operations to fulfil a request:

  1. Remote content is fetched from a Git repository scoped to a specific branch (fetch).

  2. Fetched content is merged with the local file-system, resolving any conflicts in an opinionated manner to minimise user-input (merge).

  3. Redirect user to given URL path once (1) and (2) have been completed (open).

Deliverables

Identify common nbgitpuller merge errors

Overview

After fetching content from a content-source, nbgitpuller is responsible for unifying the remote content with the local user’s filesystem (see (2) above). Where the link-consumer and link-author have each made edits to a remote file, it may be possible to account for both sets of changes in a lossless merge operation. However, there are some situations in which it is not possible to merge both the remote and local changes in a conflict-free manner. On these ocasions, nbgitpuller should resolve conflicts by preferring the remote content, whilst also preserving the link-consumer’s edits.

The nature of these kinds of failures means that they’re often content-dependent, and there are anecdotal reports of nbgitpuller failing to properly resolve merge conflicts in the wild. These kinds of failures are difficult for link-author and link-consumer personas to resolve; often this requires intervention from the nbgitpuller-persona. Through inspection of logs from existing (large) nbgitpuller deployments, we will learn more about these kinds of failures in real-world deployments. .

Definition of done

Estimates

Task

Lower Estimate

Upper Estimate

Generate structured events from raw logs

6h

10h

Analyse nbgitpuller events to identify common error types

4h

8h

Open pull-request and shepherd through to merge

4h

8h

Additional learning and refinement

2h

6h

Total

16h

32h

Implement fixes to Git-based merge routines

Overview

Following the work in the first deliverable, a set of reproducible merge failures will have been identified. Subsequently, work may be done to reduce the likelihood of these kinds of failures in order; by hardening nbgitpuller against failure during nominal usage, it may be possible to eliminate and/or diminish in importance the nbgitpuller-expert persona.

Alongside implementing fixes for these newly identified merge-failure scenarios, work should be done to embed reproducible test-cases in the nbgitpuller test suite.

Definiton of done

Estimates

Task

Lower Estimate

Upper Estimate

Create reproducible tests for existing merge-failures

8h

12h

Implement fixes for these test failures

12h

20h

Open pull-request and shepherd through to merge

4h

8h

Additional learning and refinement

2h

6h

Total

26h

46h

Additional overheads

In addition to per-deliverable work, there is up-front work that may be paid by each developer:

Task

Lower Estimate

Upper Estimate

Become familiar with nbgitpuller architecture

4h

8h

Set up development environment

2h

3h

Total

6h

11h

We will assume that two separate developers incur this cost.

Intentionally out of scope

For this statement of work, we are leaving the following as intentionally out of scope:

  1. Use of alternative conflict resolution mechanisms besides Git.

Listed below are pertinent GitHub Issues open in the jupyerhub/nbgitpuller repository, and other external resources:

People working on this

This project would require capacity from:

  1. App Engineer (1 implementation, 1 review)

Timeline