Uploaded image for project: 'Nuxeo ECM Build/Test Environment'
  1. Nuxeo ECM Build/Test Environment
  2. NXBT-3000

Intermittent Connection Reset issues in Jenkins X pipelines

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Jenkins X

      Description

      Frequently, our Jenkins X piepelines fail with a "Connection reset by peer" error.
      For instance, in the nuxeo pipeline:

      14:08:23  Downloading packages:
      14:08:24  warning: /var/cache/yum/x86_64/7/epel/packages/cfitsio-3.370-10.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 352c64e5: NOKEY
      14:08:24  Public key for cfitsio-3.370-10.el7.x86_64.rpm is not installed
      14:08:30  http://nexus/repository/yum-registry/x264-2980-1.el7.x86_64.rpm: [Errno 14] curl#56 - "Recv failure: Connection reset by peer"
      

      or in the jx-platform-builders pipeline:

      17:41:11  rpc error: code = Unknown desc = error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.27/containers/b27c789727ac0225a3b91dd952cbae5b0d89e1a98af9e9654e3cdb91f1391642/json: read unix @->/var/run/docker.sock: read: connection reset by peer
      

      According to theses blogs, it's a Kubernetes issue:
      kube-proxy Subtleties: Debugging an Intermittent Connection Reset
      SOLVING CONNECTION RESET ISSUE IN KUBERNETES

      The issue has been solved with https://github.com/kubernetes/kubernetes/pull/74840 that fixes https://github.com/kubernetes/kubernetes/issues/74839, released in v1.15.0-alpha.2, but the latest GKE version available is 1.13.7-gke.8

      Meanwhile, we can apply the workaround suggested in the first blog: "there is a way to mitigate the problem by applying the following rule in your cluster."

      apiVersion: extensions/v1beta1
      kind: DaemonSet
      metadata:
        name: startup-script
        labels:
          app: startup-script
      spec:
        template:
          metadata:
            labels:
              app: startup-script
          spec:
            hostPID: true
            containers:
            - name: startup-script
              image: gcr.io/google-containers/startup-script:v1
              imagePullPolicy: IfNotPresent
              securityContext:
                privileged: true
              env:
              - name: STARTUP_SCRIPT
                value: |
                  #! /bin/bash
                  echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal
                  echo done
      

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 2 hours
                  2h

                    PagerDuty

                    Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.