Skip to article frontmatterSkip to article content

UToronto: Users who have never logged in before can't start servers

UToronto: Users who have never logged in before can’t start servers

FieldValue
Impact TimeOct 3 at 09:11 to Oct 3 at 09:34
Duration23m 24s

Overview

Azure and University of Toronto is using Azure File as home Yuvi Panda directory storage, and needs the chowning initcontainer. We had removed it earlier, causing new server startups for users who had never logged in before to fail. Restoring it just for utoronto fixed it.

Resolution

23m 24s

Where We Got Lucky

. accidentally (otherwise this would’ve persisted for at least 3 full days)

What Went Well

  1. We were able to restore service pretty quickly once the report was acknowledged

What Didn’t Go So Well

  1. Our alerting didn’t catch this, so we had to wait for the community to catch it and report it to us. This also slowed down our investigative work, because we don’t know exactly where the 500 error was from

  2. Our logs had no mention of this particular username, and it is unclear why

Action Items

Timeline

Oct 2, 2025

TimeEvent
8:00 AM2i2c-org/infrastructure#6873 was merged, removing initContainers doing chown from our infrastructure following rollout of jupyterhub-home-nfs everywhere

Oct 3, 2025

TimeEvent
7:00 AMhttps://2i2c.freshdesk.com/a/tickets/4038 comes in, reporting that some users have trouble starting servers with ‘500 Internal Server’ errors since the previous day
9:11 AMAcknowledged as an outage and created pagerduty P1 incident Description:UToronto: Users who have never logged in before can’t start servers (View Message) UToronto: Users who have never logged in before can’t start servers
9:15 AMChecking hub logs, both existing and in jupyterhub.log on the persistent dir for the username of the user who had issues turns up nothing. Issue with login service is considered - it is an ‘internal server error’, but without clear idea of which service it’s coming from.
9:20 AMAn engineer is able to recreate the issue by deleting their own home directory and trying to start a server (details in 2i2c-org/infrastructure#6888). This was attempted because of intuition + remembering that there were recent changes in initContainers.
9:30 AM2i2c-org/infrastructure#6887 was deployed locally, restoring service. This was communicated to the community 9:34 AM UToronto: Users who have never logged in before can’t start servers