-
Type: Bug
-
Status: Resolved
-
Priority: Minor
-
Resolution: Fixed
-
Affects Version/s: None
-
Component/s: CI/CD
-
Tags:
-
Team:PLATFORM
-
Sprint:nxplatform #87
-
Story Points:1
Sometime in the Platform CI we have the following logs in the kafka output:
[2023-05-04 13:03:05,301] INFO Opening socket connection to server kafka-zookeeper/10.63.244.84:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2023-05-04 13:03:05,302] INFO Socket connection established, initiating session, client: /10.60.250.8:58242, server: kafka-zookeeper/10.63.244.84:2181 (org.apache.zookeeper.ClientCnxn) [2023-05-04 13:03:05,303] INFO Unable to read additional data from server sessionid 0x10000225a140000, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
We observe on a build that it could happen if the zookeeper pod is re-deployed, for example if the kubelet decides to remove the underlying node.
After digging the internet, it seems that the culprit could be the lost zookeeper data between the two pods.
The persistence is enabled by default on the Kafka chart whereas we disable it for unit tests.
As it occurs on a node pool scale down, which makes sense since our k8s cluster has been upgraded recently and is more agressive on node scale down, we will prevent this behavior by setting PDB on each of the services needed for tests.
- is related to
-
NXP-31862 Remove Zookeeper in CI tests
- Open