-
Type: Bug
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: ADDONS_10.10, ADDONS_2021
-
Fix Version/s: coldstorage-2021.0.0, coldstorage-10.0.0
-
Component/s: CI/CD, ColdStorage, Retention
-
Epic Link:
-
Tags:
-
Sprint:nxplatform #55
-
Story Points:3
CI is typically failing with:
[2022-02-10T19:54:34.444Z] Installed chart platform/nuxeo with name test-release into namespace nuxeo-coldstorage-10-10-dev [2022-02-10T19:54:34.989Z] + kubectl rollout status statefulset test-release-redis-master --timeout=5m --namespace=nuxeo-coldstorage-10-10-dev [2022-02-10T19:54:34.990Z] Waiting for 1 pods to be ready... [2022-02-10T19:59:41.584Z] error: timed out waiting for the condition
Looking at the pod during the execution:
kubectl -n nuxeo-coldstorage-pr-189-dev describe pod test-release-redis-master-0
We can see the following events:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NotTriggerScaleUp 3m27s cluster-autoscaler pod didn't trigger scale-up: 1 node(s) had taint {team: napps}, that the pod didn't tolerate, 1 node(s) had taint {dedicated: nodes-startup}, that the pod didn't tolerate, 1 max node group size reached, 1 node(s) had taint {team: nos}, that the pod didn't tolerate, 1 node(s) had taint {team: platform}, that the pod didn't tolerate, 1 node(s) had taint {team: ui}, that the pod didn't tolerate Warning FailedScheduling 46s (x5 over 3m29s) default-scheduler 0/28 nodes are available: 10 node(s) had taint {team: ai}, that the pod didn't tolerate, 2 node(s) had taint {team: ui}, that the pod didn't tolerate, 5 node(s) had taint {team: napps}, that the pod didn't tolerate, 5 node(s) had taint {team: platform}, that the pod didn't tolerate, 6 node(s) had taint {dedicated: nodes-startup}, that the pod didn't tolerate.
It is because our nuxeo-test-base-values.yaml has a bad indentation and we do not properly override the toleration value from https://github.com/bitnami/charts/blob/master/bitnami/redis/values.yaml
I am still not sure yet why it is just causing an issue now and not before