Uploaded image for project: 'Nuxeo ECM Build/Test Environment'
  1. Nuxeo ECM Build/Test Environment
  2. NXBT-3588

[Kubernetes CI] Fix broken Nexus after node pool upgrade

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Continuous Integration

      Description

      Since 27 January 2022, the Nexus pod was down in all team namespaces with:

      jenkins-x-nexus-85b47df67d-jcgts            0/1     ImagePullBackOff    0          3d18h
      

      At this date, an automatic upgrade was scheduled by GKE to upgrade the node pools from 1.18+ to 1.19+:

      $ gcloud container operations list
      operation-1643289053648-21fbca0e  UPGRADE_NODES  us-east1-b  pool-team-napps-prod                  DONE    2022-01-27T13:10:53.648718966Z  2022-01-27T13:29:25.550432686Z
      operation-1643291363713-6114b146  UPGRADE_NODES  us-east1-b  pool-team-ui                          DONE    2022-01-27T13:49:23.713240908Z  2022-01-27T13:52:57.170976074Z
      operation-1643293853717-38525f89  UPGRADE_NODES  us-east1-b  test-nodes-setup                      DONE    2022-01-27T14:30:53.717594304Z  2022-01-27T14:34:32.609406702Z
      operation-1643296373770-1b813ae4  UPGRADE_NODES  us-east1-b  pool-2                                DONE    2022-01-27T15:12:53.770268609Z  2022-01-27T15:47:11.930203508Z
      operation-1643300093939-dfe17063  UPGRADE_NODES  us-east1-b  pool-3-nos-build                      DONE    2022-01-27T16:14:53.939792108Z  2022-01-27T16:15:20.915064748Z
      operation-1643301353963-fffbd2c6  UPGRADE_NODES  us-east1-b  pool-team-ai                          DONE    2022-01-27T16:35:53.963802585Z  2022-01-27T18:15:45.953724714Z
      

      Thus, the nodes were drained and the pods shutdown, then restarted once the node upgrade was done.

      The Nexus deployment still relies on an old jenkins-x-platform chart, pointing to the old container registry, formerly used by Jenkins X: gcr.io/jenkinsxio.
      Yet, since Jenkins X decided to move their v2 artifacts, the images are now stored in the new registry: ghcr.io/jenkins-x.
      That's why the Nexus image described in the jenkins-x-nexus deployment couldn't be found:

      kubectl get deployment jenkins-x-nexus -ojsonpath='{.spec.template.spec.containers[0].image}'
      gcr.io/jenkinsxio/nexus:0.1.36
      

      To fix this, we need to move from:

      gcr.io/jenkinsxio/nexus:0.1.36
      

      to:

      ghcr.io/jenkins-x/nexus:0.1.37 # 0.1.36 is not available in the new registry
      

      Note: this kind of issue should encourage us to move forward with NXBT-3559.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: