Frequently, our Jenkins X piepelines fail with a "Connection reset by peer" error.
For instance, in the nuxeo pipeline:
14:08:23 Downloading packages:
14:08:24 warning: /var/cache/yum/x86_64/7/epel/packages/cfitsio-3.370-10.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 352c64e5: NOKEY
14:08:24 Public key for cfitsio-3.370-10.el7.x86_64.rpm is not installed
14:08:30 http://nexus/repository/yum-registry/x264-2980-1.el7.x86_64.rpm: [Errno 14] curl#56 - "Recv failure: Connection reset by peer"
or in the jx-platform-builders pipeline:
17:41:11 rpc error: code = Unknown desc = error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.27/containers/b27c789727ac0225a3b91dd952cbae5b0d89e1a98af9e9654e3cdb91f1391642/json: read unix @->/var/run/docker.sock: read: connection reset by peer
According to theses blogs, it's a Kubernetes issue:
kube-proxy Subtleties: Debugging an Intermittent Connection Reset
SOLVING CONNECTION RESET ISSUE IN KUBERNETES
The issue has been solved with https://github.com/kubernetes/kubernetes/pull/74840 that fixes https://github.com/kubernetes/kubernetes/issues/74839, released in v1.15.0-alpha.2, but the latest GKE version available is 1.13.7-gke.8
Meanwhile, we can apply the workaround suggested in the first blog: "there is a way to mitigate the problem by applying the following rule in your cluster."
- name: startup-script
- name: STARTUP_SCRIPT
echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal