Uploaded image for project: 'Nuxeo ECM Build/Test Environment'
  1. Nuxeo ECM Build/Test Environment
  2. NXBT-3624

[Kubernetes CI] Network issues blocking Platform, AI and Web UI CI

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Continuous Integration

      Description

      Since 24 March, all the Jenkins instances are unavailable: platform, napps, ai. Hitting https://jenkins.platform.dev.nuxeo.com/ redirects to https://jenkins.platform.dev.nuxeo.com/securityRealm/finishLogin?code=a502e9fda8fbb80d68b7&state=VHjDS2Qggmb.NNhY1hLKQZabL5o.
      In the jenkins pod logs, available in the GCP console:

      2022-03-24 14:55:25.666+0000 [id=6540] WARNING o.e.j.s.h.ContextHandler$Context#log: Error while serving https://jenkins.platform.dev.nuxeo.com/securityRealm/finishLogin
      
      2022-03-24 14:55:25.668 GMTjava.net.UnknownHostException: github.com: Temporary failure in name resolution at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929) at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1519) at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848) at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1509) at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1368) at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1302) at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) at org.jenkinsci.plugins.GithubSecurityRealm.getAccessToken(GithubSecurityRealm.java:461) at org.jenkinsci.plugins.GithubSecurityRealm.doFinishLogin(GithubSecurityRealm.java:399) at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:393)
      

      We also observe the following issue with kubectl:

      $ kubectl logs -f jenkins-0 -c jenkins
      Error from server: Get "https://10.142.0.57:10250/containerLogs/platform/jenkins-0/jenkins?follow=true": dial timeout, backstop
      

      The Web UI GitHub actions runner pods are also inpacted, so, the problem is not related to Jenkins itself:

      2022-03-24 14:58:33.650 GMTcurl: (6) Could not resolve host: api.github.com
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ataillefer Antoine Taillefer
                Reporter:
                ataillefer Antoine Taillefer
                Participants:
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: