OOM kill of enforce-xfs-quota on neurohackademy
| Field | Value |
|---|---|
| Impact Time | Oct 16 at 15:00 to Oct 16 at 16:14 |
| Duration | 1h 14m |
Overview¶
The enforce-xfs-quota container experienced OOM following Angus quota changes
What Happened¶
The hard quota for the neurohackademy prod hub was modified, Oct 16 at 15:00 to Oct 16 at 16:14 leading to an update of all projects. The generator script encountered an OOM error when processing a large directory (_shared).
Where We Got Lucky¶
The engineer noticed this when comparing configurations between clusters.
Action Items¶
Modify enforce-xfs-quota script to reduce likelihood of OOM.
Timeline¶
Oct 16, 2025¶
| Time | Event |
|---|---|
| 3:42 PM | An engineer notices the enforce-xfs-quota container restarting |
| 3:47 PM | The engineer creates a debug container and runs the generator |
| 4:00 PM | The engineer successfully ran the script, and investigated why it was crashing for the main container |