Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-32010

Reduce load on Kafka during unit tests

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2023.2, 2021.43
    • Component/s: Tests
    • Sprint:
      nxplatform #93, nxplatform #94
    • Story Points:
      2

      Description

      The process of deleting and creating Kafka topics is slow and resource-intensive.
      This slow down the CI unit test and create random bugs.

      For instance topic deletion is done asynchronously and sometime fails to recreate an not-yet-delete topic,
      the topic prefix being a final static between test suite some topics are always the same deleted and recreated.

      WARN  [KafkaUtils] Cannot create topic, it already exists: nuxeo-test-1690545939613-internal-processors
      WARN  [KafkaUtils] Waiting for brokers to become aware that the topic nuxeo-test-1690545939613-internal-processors has been created.
      ERROR [RegistrationInfoImpl] Component service:org.nuxeo.runtime.stream.service notification of application started failed: java.util.concurrent.TimeoutException: Timeout while waiting for topic nuxeo-test-1690545939613-internal-processors metadata propagation in the cluster
      

      Also, we have seen Kafka or ZK restarting in the CI K8s, Kafka is subject to memory pressure at the end of tests because of the huge number of topics (even if recreated with the same name) and when using distinct name we might experience OOM killer which prove that heap is not correctly set for Kafka.

      Full stack example:

      Error
      Failed to invoke operation WorkManager.RunWorkInFailure
      Stacktrace
      org.nuxeo.ecm.automation.OperationException: Failed to invoke operation WorkManager.RunWorkInFailure
      	at org.nuxeo.ecm.automation.core.impl.InvokableMethod.invoke(InvokableMethod.java:186)
      	at org.nuxeo.ecm.automation.core.impl.OperationChainCompiler$OperationMethod.invoke(OperationChainCompiler.java:147)
      	at org.nuxeo.ecm.automation.core.impl.OperationChainCompiler$CompiledChainImpl.lambda$invoke$0(OperationChainCompiler.java:212)
      
      ...
      
      Caused by: java.lang.NullPointerException: Cannot invoke "org.nuxeo.lib.stream.computation.StreamManager.registerAndCreateProcessor(String, org.nuxeo.lib.stream.computation.Topology, org.nuxeo.lib.stream.computation.Settings)" because "streamManager" is null
      	at org.nuxeo.ecm.automation.core.operations.services.workmanager.WorkManagerRunWorkInFailure.run(WorkManagerRunWorkInFailure.java:95)
      	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
      	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
      	at org.nuxeo.ecm.automation.core.impl.InvokableMethod.doInvoke(InvokableMethod.java:164)
      	at org.nuxeo.ecm.automation.core.impl.InvokableMethod.invoke(InvokableMethod.java:176)
      	... 53 more
      ...
      2023-07-28 12:14:56,725 [main] WARN  [KafkaUtils] Cannot create topic, it already exists: nuxeo-test-1690545939613-internal-processors
      2023-07-28 12:14:56,727 [main] WARN  [KafkaUtils] Waiting for brokers to become aware that the topic nuxeo-test-1690545939613-internal-processors has been created.
      2023-07-28 12:17:56,792 [main] ERROR [RegistrationInfoImpl] Component service:org.nuxeo.runtime.stream.service notification of application started failed: java.util.concurrent.TimeoutException: Timeout while waiting for topic nuxeo-test-1690545939613-internal-processors metadata propagation in the cluster
      org.nuxeo.lib.stream.StreamRuntimeException: java.util.concurrent.TimeoutException: Timeout while waiting for topic nuxeo-test-1690545939613-internal-processors metadata propagation in the cluster
      	at org.nuxeo.lib.stream.log.kafka.KafkaUtils.waitForTopicCreation(KafkaUtils.java:237) ~[nuxeo-stream-2023.1.15-PR-1319-BUILD-2.jar:?]
      	at org.nuxeo.lib.stream.log.kafka.KafkaUtils.createTopic(KafkaUtils.java:224) ~[nuxeo-stream-2023.1.15-PR-1319-BUILD-2.jar:?]
      	at org.nuxeo.lib.stream.log.kafka.KafkaLogManager.create(KafkaLogManager.java:103) ~[nuxeo-stream-2023.1.15-PR-1319-BUILD-2.jar:?]
      	at org.nuxeo.lib.stream.log.internals.AbstractLogManager.createIfNotExists(AbstractLogManager.java:75) ~[nuxeo-stream-2023.1.15-PR-1319-BUILD-2.jar:?]
      	at org.nuxeo.lib.stream.log.UnifiedLogManager.createIfNotExists(UnifiedLogManager.java:122) ~[nuxeo-stream-2023.1.15-PR-1319-BUILD-2.jar:?]
      	at org.nuxeo.lib.stream.computation.log.LogStreamManager.initInternalStream(LogStreamManager.java:104) ~[nuxeo-stream-2023.1.15-PR-1319-BUILD-2.jar:?]
      
      Caused by: java.util.concurrent.TimeoutException: Timeout while waiting for topic nuxeo-test-1690545939613-internal-processors metadata propagation in the cluster
      	... 34 more
      
      

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: