-
Type: Improvement
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: None
-
Tags:
-
Sprint:nxplatform #87
-
Story Points:3
Since the Kubernetes cluster upgrade to 1.22, the "ScaleDown" event deletes the Nexus pod, which then takes a long time to restart due to "Unable to attach or mount volumes".
This is blocking the CI jobs with errors such as:
Connect to nexus:80 [nexus/10.63.255.85] failed: Connection refused (Connection refused)
ScaleDown events:
k get events --sort-by='lastTimestamp' | grep "deleting pod" 7m17s Normal ScaleDown pod/nexus-84f644b8f8-lnnzs deleting pod for node scale down ...
See https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler
In GKE clusters with control plane version 1.22 or later, Pods with local storage no longer block scaling down.
Possible fix: adding the following annotation to the Nexus pod:
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
See https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-autoscaler-visibility.
Also see NXBT-3604 and possible fix https://github.com/jenkinsci/helm-charts/blob/main/charts/jenkins/README.md#long-volume-attachmount-times.
- is related to
-
NXBT-3604 [PlatformCI] Fix Chartmuseum stuck at ContainerCreating
- Resolved