-
Type: Bug
-
Status: Resolved
-
Priority: Blocker
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: Jenkins X
-
Tags:
-
Sprint:DevTools-09
Frequently, our Jenkins X piepelines fail with a "Connection reset by peer" error.
For instance, in the nuxeo pipeline:
14:08:23 Downloading packages: 14:08:24 warning: /var/cache/yum/x86_64/7/epel/packages/cfitsio-3.370-10.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 352c64e5: NOKEY 14:08:24 Public key for cfitsio-3.370-10.el7.x86_64.rpm is not installed 14:08:30 http://nexus/repository/yum-registry/x264-2980-1.el7.x86_64.rpm: [Errno 14] curl#56 - "Recv failure: Connection reset by peer"
or in the jx-platform-builders pipeline:
17:41:11 rpc error: code = Unknown desc = error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.27/containers/b27c789727ac0225a3b91dd952cbae5b0d89e1a98af9e9654e3cdb91f1391642/json: read unix @->/var/run/docker.sock: read: connection reset by peer
According to theses blogs, it's a Kubernetes issue:
kube-proxy Subtleties: Debugging an Intermittent Connection Reset
SOLVING CONNECTION RESET ISSUE IN KUBERNETES
The issue has been solved with https://github.com/kubernetes/kubernetes/pull/74840 that fixes https://github.com/kubernetes/kubernetes/issues/74839, released in v1.15.0-alpha.2, but the latest GKE version available is 1.13.7-gke.8
Meanwhile, we can apply the workaround suggested in the first blog: "there is a way to mitigate the problem by applying the following rule in your cluster."
apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: startup-script labels: app: startup-script spec: template: metadata: labels: app: startup-script spec: hostPID: true containers: - name: startup-script image: gcr.io/google-containers/startup-script:v1 imagePullPolicy: IfNotPresent securityContext: privileged: true env: - name: STARTUP_SCRIPT value: | #! /bin/bash echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal echo done