Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-25231

File descriptor leak in Chronicle Queue impl of LogManager.getLag

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 10.2-SNAPSHOT
    • Fix Version/s: 10.2
    • Component/s: Streams

      Description

      When using the AbstractAuditBackend.await to wait for the audit log writer, the stream processor implementation loop on a LogManager.getLag call until there is no lag.
      After 200s (4k calls) we may run into a "too many open file error":

              org.nuxeo.ecm.automation.elasticsearch.ElasticsearchWaitForIndexingOperation
      
                      at org.nuxeo.ecm.automation.core.trace.TracerFactory.onTrace(TracerFactory.java:198)
                      at org.nuxeo.ecm.automation.core.trace.Tracer.popContext(Tracer.java:73)
                      at org.nuxeo.ecm.automation.core.trace.Tracer.onChainExit(Tracer.java:86)
                      at org.nuxeo.ecm.automation.core.impl.OperationChainCompiler$CompiledChainImpl.lambda$invoke$0(OperationChainCompiler.java:227)
                      ... 112 more
      Caused by: java.lang.IllegalArgumentException: Invalid basePath for queue: /opt/nuxeo/data/stream/audit/audit
              at org.nuxeo.lib.stream.log.chronicle.ChronicleLogAppender.partitions(ChronicleLogAppender.java:298)
              at org.nuxeo.lib.stream.log.chronicle.ChronicleLogAppender.<init>(ChronicleLogAppender.java:88)
              at org.nuxeo.lib.stream.log.chronicle.ChronicleLogAppender.open(ChronicleLogAppender.java:163)
              at org.nuxeo.lib.stream.log.chronicle.ChronicleLogManager.getLagForPartition(ChronicleLogManager.java:140)
              at org.nuxeo.lib.stream.log.chronicle.ChronicleLogManager.getLagPerPartition(ChronicleLogManager.java:159)
              at org.nuxeo.lib.stream.log.LogManager.getLag(LogManager.java:171)
              at org.nuxeo.ecm.platform.audit.service.AbstractAuditBackend.await(AbstractAuditBackend.java:415)
              at org.nuxeo.ecm.automation.elasticsearch.ElasticsearchWaitForIndexingOperation.run(ElasticsearchWaitForIndexingOperation.java:82)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.nuxeo.ecm.automation.core.impl.InvokableMethod.doInvoke(InvokableMethod.java:163)
              at org.nuxeo.ecm.automation.core.impl.InvokableMethod.invoke(InvokableMethod.java:175)
              ... 114 more
      Caused by: java.nio.file.FileSystemException: /opt/nuxeo/data/stream/audit/audit: Too many open files
              at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
              at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
              at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
              at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
              at java.nio.file.Files.newDirectoryStream(Files.java:457)
              at java.nio.file.Files.list(Files.java:3451)
              at org.nuxeo.lib.stream.log.chronicle.ChronicleLogAppender.partitions(ChronicleLogAppender.java:290)
              ... 127 more
      

      We can see there are lot of the open file descriptors on directory-listing.cq4t (7476 out of 8168).

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 day
                  1d