Enforce-nfs-quota pod restarts on victor
| Field | Value |
|---|---|
| Impact Time | Oct 17 at 12:40 to Oct 17 at 13:00 |
| Duration | 19m 23s |
Overview¶
The enforce-nfs-quota container on the victor prod hub Angus encountered a fatal error after attempting to update the global hard-quota.
What Happened¶
Oct 17 at 12:40 to Oct 17 at 13:00 After a hard-quota change, the enforce-xfs-quota container attempted to reconcile the new quotas with the existing filesystem. When running through each user, the stderr output from the recursive project loop included non UTF-8 byte, which caused a fatal error when decoding for logging purposes. 19m 23s
Resolution¶
A patched version of jupyterhub_home_nfs.generate was *All times listed in this report are in executed in a debug sidecar, which resolved the filesystem with Edinburgh, London. the intended quotas.
Where We Got Lucky¶
An engineer was already working on other NFS clusters, and was quick to respond
What Went Well¶
The time to author and push an in-place fix was small
Action Items¶
Deploy the patched version of jupyterhub_home_nfs to all clusters through our deployment infrastructure
Timeline¶
Oct 17, 2025¶
| Time | Event |
|---|---|
| 12:53 PM | Engineer runs enforcement script in debug pod, sees encoding traceback |
| 1:00 PM | Engineer patches jupyterhub_home_nfs and runs the patched version in a debug container |
| 1:02 PM | Engineer confirms that pod is now stable after updating quotas |