-
Type: Bug
-
Status: Resolved
-
Priority: Minor
-
Resolution: Fixed
-
Affects Version/s: None
-
Component/s: Tests
-
Tags:
-
Sprint:nxplatform #93, nxplatform #94
-
Story Points:2
The process of deleting and creating Kafka topics is slow and resource-intensive.
This slow down the CI unit test and create random bugs.
For instance topic deletion is done asynchronously and sometime fails to recreate an not-yet-delete topic,
the topic prefix being a final static between test suite some topics are always the same deleted and recreated.
WARN [KafkaUtils] Cannot create topic, it already exists: nuxeo-test-1690545939613-internal-processors WARN [KafkaUtils] Waiting for brokers to become aware that the topic nuxeo-test-1690545939613-internal-processors has been created. ERROR [RegistrationInfoImpl] Component service:org.nuxeo.runtime.stream.service notification of application started failed: java.util.concurrent.TimeoutException: Timeout while waiting for topic nuxeo-test-1690545939613-internal-processors metadata propagation in the cluster
Also, we have seen Kafka or ZK restarting in the CI K8s, Kafka is subject to memory pressure at the end of tests because of the huge number of topics (even if recreated with the same name) and when using distinct name we might experience OOM killer which prove that heap is not correctly set for Kafka.
Full stack example:
Error Failed to invoke operation WorkManager.RunWorkInFailure Stacktrace org.nuxeo.ecm.automation.OperationException: Failed to invoke operation WorkManager.RunWorkInFailure at org.nuxeo.ecm.automation.core.impl.InvokableMethod.invoke(InvokableMethod.java:186) at org.nuxeo.ecm.automation.core.impl.OperationChainCompiler$OperationMethod.invoke(OperationChainCompiler.java:147) at org.nuxeo.ecm.automation.core.impl.OperationChainCompiler$CompiledChainImpl.lambda$invoke$0(OperationChainCompiler.java:212) ... Caused by: java.lang.NullPointerException: Cannot invoke "org.nuxeo.lib.stream.computation.StreamManager.registerAndCreateProcessor(String, org.nuxeo.lib.stream.computation.Topology, org.nuxeo.lib.stream.computation.Settings)" because "streamManager" is null at org.nuxeo.ecm.automation.core.operations.services.workmanager.WorkManagerRunWorkInFailure.run(WorkManagerRunWorkInFailure.java:95) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.nuxeo.ecm.automation.core.impl.InvokableMethod.doInvoke(InvokableMethod.java:164) at org.nuxeo.ecm.automation.core.impl.InvokableMethod.invoke(InvokableMethod.java:176) ... 53 more ... 2023-07-28 12:14:56,725 [main] WARN [KafkaUtils] Cannot create topic, it already exists: nuxeo-test-1690545939613-internal-processors 2023-07-28 12:14:56,727 [main] WARN [KafkaUtils] Waiting for brokers to become aware that the topic nuxeo-test-1690545939613-internal-processors has been created. 2023-07-28 12:17:56,792 [main] ERROR [RegistrationInfoImpl] Component service:org.nuxeo.runtime.stream.service notification of application started failed: java.util.concurrent.TimeoutException: Timeout while waiting for topic nuxeo-test-1690545939613-internal-processors metadata propagation in the cluster org.nuxeo.lib.stream.StreamRuntimeException: java.util.concurrent.TimeoutException: Timeout while waiting for topic nuxeo-test-1690545939613-internal-processors metadata propagation in the cluster at org.nuxeo.lib.stream.log.kafka.KafkaUtils.waitForTopicCreation(KafkaUtils.java:237) ~[nuxeo-stream-2023.1.15-PR-1319-BUILD-2.jar:?] at org.nuxeo.lib.stream.log.kafka.KafkaUtils.createTopic(KafkaUtils.java:224) ~[nuxeo-stream-2023.1.15-PR-1319-BUILD-2.jar:?] at org.nuxeo.lib.stream.log.kafka.KafkaLogManager.create(KafkaLogManager.java:103) ~[nuxeo-stream-2023.1.15-PR-1319-BUILD-2.jar:?] at org.nuxeo.lib.stream.log.internals.AbstractLogManager.createIfNotExists(AbstractLogManager.java:75) ~[nuxeo-stream-2023.1.15-PR-1319-BUILD-2.jar:?] at org.nuxeo.lib.stream.log.UnifiedLogManager.createIfNotExists(UnifiedLogManager.java:122) ~[nuxeo-stream-2023.1.15-PR-1319-BUILD-2.jar:?] at org.nuxeo.lib.stream.computation.log.LogStreamManager.initInternalStream(LogStreamManager.java:104) ~[nuxeo-stream-2023.1.15-PR-1319-BUILD-2.jar:?] Caused by: java.util.concurrent.TimeoutException: Timeout while waiting for topic nuxeo-test-1690545939613-internal-processors metadata propagation in the cluster ... 34 more