Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

A New Binderbot for Remote Execution of MyST Projects

Background

BinderHub is

[...] a kubernetes-based cloud service that allows users to share reproducible interactive computing environments from code repositories. [...]

https://binderhub.readthedocs.io/en/latest/

Its most popular deployment is the public MyBinder federation, which exposes an unauthenticated BinderHub instance under the https://mybinder.org domain.

The Binderbot CLI is a tool for executing local Jupyter Notebooks on a remote BinderHub, facilitating the use of expensive cloud resources in restricted computation environments such as (free) GitHub Actions runners. It was developed by the Pangeo community, and has received little development over the past few years such that it is now considered end-of-life. The commandline interface accepts a set of notebook filenames that should be executed, and the URL of a BinderHub that should perform the execution.

The existing code is fragile, and does not integrate directly with tools like Jupyter Book or the MyST Document Engine for building rich narrative experiences from computational notebooks. This statement of work outlines an approach for addressing the remote-execution problem that replaces Binderbot with CLI tool and GitHub action for starting a BinderHub session. The intention is to replace Binderbot with a small, fit-for-purpose tool whilst admitting future efforts that can focus on solving the broader problems associated with remote computation.

User stories

Binderbot is a technical tool that facilitates the following user stores:

Technical details

Under the hood, Binderbot handles a number of distinct responsibilities:

  1. Establishing a BinderHub session on a remote BinderHub using a configurable GitHub repository to determine the environment specification.

  2. Starting a kernel for remote code execution.

  3. Uploading local notebooks to the remote BinderHub session.

  4. Sending a code shim that handles execution of a particular notebook.

  5. Writing executed notebooks back to the local file system.

  6. Tearing down the BinderHub session.

Tools like Jupyter Book 2 and the MyST Document Engine integrate with Jupyter Server via the Jupyter services REST API, which facilitates starting kernels, uploading files, etc. Presently, like Binderbot, these tools do not attempt to validate that the local and remote environments share the same resources (such as data files). Instead, fragments of code are sent to the remote kernel, and the responses consumed by the application.

The MyST Document Engine may consume the URL to a running Jupyter Server rather than attempting to start a local server. It follows that by building a new application with the distinct responsibility of managing the lifecycle of a BinderHub session (clinder), it will be possible to perform remote execution of MyST projects by passing the URL of a running BinderHub session to the MyST Document Engine. Future work may build upon this platform to invoke remote procedure calls, upload files, and perform other useful functions through the REST API exposed by Jupyter Server.

Deliverables

Build a new CLI for managing an unauthenticated BinderHub session

Overview

The existing Binderbot CLI has too many responsibilities. A replacement tool will be built that:

  1. Starts a single remote BinderHub session clinder start, and outputs the running session information as structured data or a URI-with-token.

  2. Tears down the existing session clinder stop <URI>

Under the hood, Binderbot handles a number of distinct responsibilities:

  1. Establishing a BinderHub session on a remote BinderHub using a configurable GitHub repository to determine the environment specification.

  2. Tearing down the BinderHub session.

Definition of done

Estimates

Task

Lower Estimate

Upper Estimate

Familiarise oneself with MyST execution and Binderbot v1

1h

3h

Build Node.js script to launch and stop BinderHub sessions

1h

3h

Publish script as package on GitHub and npm with CI releases

2h

4h

Code review

1h

1h

Total

5h

11h

Build a new GitHub action for managing BinderHub sessions in CI

Overview

The Project Pythia project is most likely to consume the new clinder tool inside a GitHub Actions workflow. A new GitHub Action will be built to simplify this workflow, for example by handling session shutdown automatically. The new GitHub action may also set the necessary environment variables for tools like the MyST Document Engine to consume the BinderHub session URL, and determine the current GitHub repository for defining the BinderHub specification.

Definition of done

Estimates

Task

Lower Estimate

Upper Estimate

Create GitHub action that outputs BinderHub information as variables

1h

3h

Publish GitHub action to GitHub Actions Marketplace

1h

2h

Create demonstration resource that uses this action

1h

2h

Code review

1h

1h

Total

4h

8h

Support Project Pythia in migrating to the new tool

Overview

The Project Pythia project has a cookbook template and a number of cookbook repositories that have been built from it. Each of these repositories references a bespoke set of cookbook actions that are used to centralise the implementation of cookbook CI/CD semantics. We will dedicate time to upgrading this set of GitHub Actions such that they use the new clinder tool.

Definition of done

Estimates

Task

Lower Estimate

Upper Estimate

Become familiar with the Project Pythia cookbook actions

1h

1h

Update the cookbook actions to use the new clinder tool & deploy to a test branch

1h

2h

Fork an existing cookbook and test it against the forked actions

1h

2h

Identify and fix outstanding bugs

1h

3h

Review and deploy new version of actions to main branch

1h

2h

Total

5h

10h

Intentionally out of scope

For this statement of work, we are leaving the following as intentionally out of scope:

  1. A job-scheduling system (i.e. total-project execution on a remote BinderHub).

  2. Sidecar state management (i.e. uploading data files to the remote BinderHub session, modifying the environment variables).
    See jupyter-environment-provisioner for an example of updating the environment variables.

  3. Authenticated BinderHub support. This is an important use case, but something that we should follow up with in a second SoW.

Listed below are pertinent GitHub Issues:

People working on this

This project would require capacity from:

  1. App Engineer (1 implementation, 1 review)

Timeline