Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-25399

Fix random AuditLogWriter error in logs during functional tests

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 10.2-SNAPSHOT
    • Fix Version/s: 10.3
    • Component/s: Audit, Elasticsearch

      Description

      The following stack trace can appear in server logs:

      2018-07-12 16:59:23,007 INFO  [http-nio-0.0.0.0-8080-exec-18] [org.nuxeo.runtime.reload.ReloadComponent] Before stop/standby component manager
      2018-07-12 16:59:24,196 WARN  [http-nio-0.0.0.0-8080-exec-18] [org.nuxeo.lib.stream.computation.log.ComputationPool] AuditLogWriter: Timeout on wait for pool termination
      2018-07-12 16:59:24,197 ERROR [AuditLogWriterPool-00,in:17,inCheckpoint:15,out:0,lastRead:1531414762251,lastTimer:1531414762027,wm:200725595649605633,loop:29,record] [org.nuxeo.lib.stream.computation.log.ComputationRunner] AuditLogWriter: Exception in processLoop: thread waiting for the response was interrupted
      java.lang.RuntimeException: thread waiting for the response was interrupted
      	at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:687)
      	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
      	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:198)
      	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:522)
      	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:508)
      	at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:348)
      	at org.nuxeo.elasticsearch.client.ESRestClient.index(ESRestClient.java:379)
      	at org.nuxeo.elasticsearch.seqgen.ESUIDSequencer.getNextLong(ESUIDSequencer.java:93)
      	at org.nuxeo.elasticsearch.audit.ESAuditBackend.addLogEntries(ESAuditBackend.java:483)
      	at org.nuxeo.ecm.platform.audit.impl.StreamAuditWriter$AuditLogWriterComputation.writeEntriesToAudit(StreamAuditWriter.java:127)
      	at org.nuxeo.ecm.platform.audit.impl.StreamAuditWriter$AuditLogWriterComputation.processTimer(StreamAuditWriter.java:96)
      	at org.nuxeo.lib.stream.computation.log.ComputationRunner.lambda$processTimer$2(ComputationRunner.java:246)
      	at java.util.LinkedHashMap.forEach(LinkedHashMap.java:676)
      	at org.nuxeo.lib.stream.computation.log.ComputationRunner.processTimer(ComputationRunner.java:244)
      	at org.nuxeo.lib.stream.computation.log.ComputationRunner.processLoop(ComputationRunner.java:193)
      	at org.nuxeo.lib.stream.computation.log.ComputationRunner.run(ComputationRunner.java:152)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
      	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
      	at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:683)
      	... 20 more
      

      This is extracted from hot-reload server tests server.log

      From Benoit Delbosc:

      The situation here is that on hot reload the stream service is restarted, during stop we wait for computation to terminate their current processing, in this case we wait 1s for the AuditLogWriter computation but it is not enough and there is a timeout WARN the stop procedure continue and interrupt the computation raising an ERROR.
      So there is no bug here, the behavior is correct. This happens probably because elasticsearch is slow and impacted by NXP-25355
      This could be mitigated by remove error in logs by:

      • wait few seconds after actions and doing hot reload
      • log all interrupted exception as WARN

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: