Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-26337

Fix random in testComplexTopoManyRecords with CQ

    XMLWordPrintable

    Details

      Description

      org.nuxeo.lib.stream.tests.computation.TestLogStreamProcessorChronicle.testComplexTopoManyRecords (from org.nuxeo.lib.stream.tests.computation.TestLogStreamProcessorKafkaNoSubscribe)
      Failing for the past 1 build (Since Failed#1154 )
      Took 15 sec.
      	This test was not claimed.
      Error Message
      
      expected:<8024> but was:<6022>
      
      Stacktrace
      
      java.lang.AssertionError: expected:<8024> but was:<6022>
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.failNotEquals(Assert.java:834)
      	at org.junit.Assert.assertEquals(Assert.java:645)
      	at org.junit.Assert.assertEquals(Assert.java:631)
      	at org.nuxeo.lib.stream.tests.computation.TestStreamProcessor.testComplexTopo(TestStreamProcessor.java:248)
      	at org.nuxeo.lib.stream.tests.computation.TestStreamProcessor.testComplexTopoManyRecords(TestStreamProcessor.java:284)
      

      There has some locking problem on offset log that might explain an extra delay and the lag in processing,

      01:40:01,755 [C4Pool-00,in:1,inCheckpoint:0,out:0,lastRead:1543455601752,lastTimer:0,wm:202303812495343616,loop:0,record] ERROR [ComputationRunner] C4: CHECKPOINT FAILURE: Resume may create duplicates.
      01:40:16,766 [C4Pool-02,in:1,inCheckpoint:0,out:0,lastRead:1543455601757,lastTimer:0,wm:202303812494557184,loop:0,record] WARN  [Slf4jExceptionHandler$2] Couldn't acquire write lock after 15000ms for the lock file:/tmp/junit8051582017018032345/junit8567907696658844218/s3/offset-C4/metadata.cq4t, overriding the lock. Lock was held by PID 99176
      01:40:16,766 [C4Pool-01,in:1,inCheckpoint:0,out:0,lastRead:1543455601753,lastTimer:0,wm:202303812495474688,loop:0,record] WARN  [Slf4jExceptionHandler$2] Couldn't acquire write lock after 15000ms for the lock file:/tmp/junit8051582017018032345/junit8567907696658844218/s3/offset-C4/metadata.cq4t, overriding the lock. Lock was held by PID 99176
      01:40:16,768 [C4Pool-02,in:1,inCheckpoint:0,out:0,lastRead:1543455601757,lastTimer:0,wm:202303812494557184,loop:0,record] WARN  [Slf4jExceptionHandler$2] Forced unlock for the lock file:/tmp/junit8051582017018032345/junit8567907696658844218/s3/offset-C4/metadata.cq4t
      net.openhft.chronicle.core.StackTrace: null
      	at net.openhft.chronicle.queue.impl.table.AbstractTSQueueLock.forceUnlock(AbstractTSQueueLock.java:56) [chronicle-queue-5.17.0.jar:?]
      	at net.openhft.chronicle.queue.impl.single.TableStoreWriteLock.lock(TableStoreWriteLock.java:57) [chronicle-queue-5.17.0.jar:?]
      	at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreAppender.writingDocument(SingleChronicleQueueExcerpts.java:354) [chronicle-queue-5.17.0.jar:?]
      	at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreAppender.writingDocument(SingleChronicleQueueExcerpts.java:346) [chronicle-queue-5.17.0.jar:?]
      	at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreAppender.writeBytes(SingleChronicleQueueExcerpts.java:143) [chronicle-queue-5.17.0.jar:?]
      	at org.nuxeo.lib.stream.log.chronicle.ChronicleLogOffsetTracker.commit(ChronicleLogOffsetTracker.java:146) [classes/:?]
      	at org.nuxeo.lib.stream.log.chronicle.ChronicleLogTailer.commit(ChronicleLogTailer.java:141) [classes/:?]
      	at org.nuxeo.lib.stream.log.chronicle.ChronicleLogTailer.commit(ChronicleLogTailer.java:150) [classes/:?]
      	at org.nuxeo.lib.stream.computation.log.ComputationRunner.saveOffsets(ComputationRunner.java:387) [classes/:?]
      	at org.nuxeo.lib.stream.computation.log.ComputationRunner.checkpoint(ComputationRunner.java:369) [classes/:?]
      	at org.nuxeo.lib.stream.computation.log.ComputationRunner.checkpointIfNecessary(ComputationRunner.java:354) [classes/:?]
      	at org.nuxeo.lib.stream.computation.log.ComputationRunner.processRecord(ComputationRunner.java:302) [classes/:?]
      	at org.nuxeo.lib.stream.computation.log.ComputationRunner.processLoop(ComputationRunner.java:199) [classes/:?]
      	at org.nuxeo.lib.stream.computation.log.ComputationRunner.run(ComputationRunner.java:158) [classes/:?]
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_191]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
      	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
      

      observed for the first time:
      https://qa.nuxeo.org/jenkins/job/Deploy/job/IT-nuxeo-10.3-build/1154/

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: