Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-27544

Fix ComputationRunner processLoop timeout ERRORs in logs

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 9.10, 10.10, 11.1-SNAPSHOT
    • Fix Version/s: 9.10-HF34, 10.10-HF09, 11.1, 2021.0
    • Component/s: Streams
    • Release Notes Summary:
      ComputationRunner processLoop times out without errors in logs
    • Backlog priority:
      700
    • Sprint:
      nxplatform 11.1.12
    • Story Points:
      3

      Description

      Running 9.10-HF32 with StreamWorkManager, the ComputationRunner intermittently generates the error below. This appears to be a lastReadTime synchronization issue. The value is updated in methods onPartitionsAssigned and processRecord and accessed in method getTimeoutDuration without protection. Suggest method getTimeoutDuration be replaced as follows to fix the problem:

      -
      protected Duration getTimeoutDuration() {
          long millis = Math.max(0, System.currentTimeMillis() - lastReadTime);
          return Duration.ofMillis(Math.min(READ_TIMEOUT.toMillis(), millis);
      

      Error generated:

      2019-06-06 01:26:24,014 ERROR [defaultPool-02,in:8352,inCheckpoint:8351,out,0,lastRead:1559784384019,lastTimer:0,wm:204444058777681921,loop:724872,checkpoint] [org.nuxeo.lib.stream.computation.log.ComputationRunner] default: Exception in processLoop: Timeout must not be negative
      java.lang.IllegalArgumentException: Timeout must not be negative
          at org.apache.kafka.clients.consumer.KafkaConsumer.pool(KafkaConsumer.java:1094)
          at org.nuxeo.lib.stream.log.kafka.KafkaLogTailer.pool(KafkaLogTailer.java:196)
          at org.nuxeo.lib.stream.log.kafka.KafkaLogTailer.read(KafkaLogTailer.java:147)
          at org.nuxeo.lib.stream.computation.log.ComputationRunner.processRecord(ComputationRunner.java:262)
          at org.nuxeo.lib.stream.computation.log.ComputationRunner.processLoop(ComputationRunner.java:183)
          at org.nuxeo.lib.stream.computation.log.ComputationRunner.run(ComputationRunner.java:142)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      ...
      

      Master code has the same issue:
      https://github.com/nuxeo/nuxeo/blob/master/nuxeo-runtime/nuxeo-stream/src/main/java/org/nuxeo/lib/stream/computation/log/ComputationRunner.java#L339

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 0 minutes
                0m
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 1 hour
                1h