Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

K8s-autoscaler version incompatibility on berkeley-geoupyter cluster

FieldValue
Impact TimeMar 31 at 15:10 to Mar 31 at 17:00
Duration1h 49m 10s

Overview

The cluster was running an older k8s version than the majority of clusters. Once the cluster autoscaler version was bumped a day prior, the two became incompatible.

What Happened

New nodes were not being spawned because the cluster autoscaler wasn’t triggering scale-up events.

Resolution

Upgrading the k8s version of the cluster from 1.33 to 1.34 and downgrading the cluster autoscaler, one patch version, fixed it.

Where We Got Lucky

The problem was triggered by an automatic health check run and not by actual users not being able to spawn servers.

What Went Well

Context about the cluster autoscaler version bump was fresh, so we went on the correct path from the beginning.

What Didn’t Go So Well

When we designed the upgrade batches, we only took into account clusters running 1.32, so this 1.33 cluster was missed.

Action Items

upgrades.yaml should help standardize versions across clusters