Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-22961

Improve interrupted state and exception handling

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 9.3
    • Component/s: Runtime

      Description

      We want to better handle handling of interrupted state/exception.
      We already did some work with NXP-22955 about logging.

      Now we also want to improve logging in WorkManager especially in the exception handler given to the ThreadFactory.
      Furthermore, we can try to not retry a work when we detect this state, for example on a hot reload test we have:

      2017-08-23 11:57:48,276 ERROR [Nuxeo-Work-elasticSearchIndexing-3] [org.nuxeo.ecm.core.work.WorkManagerImpl] Uncaught error on thread Nuxeo-Work-elasticSearchIndexing-3
      java.lang.RuntimeException: Work failed after 1 retry, class=class org.nuxeo.elasticsearch.work.ScrollingIndexingWorker id=99524386413355.713240742 category=elasticSearchIndexing title=Elasticsearch scrolling indexer: SELECT ecm:uuid FROM Document WHERE ecm:ancestorId = '029b9890-e4f0-42b4-b24c-7af426622b85', processed 0
      	at org.nuxeo.ecm.core.work.AbstractWork.run(AbstractWork.java:348)
      	at org.nuxeo.ecm.core.work.WorkHolder.run(WorkHolder.java:57)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: org.nuxeo.ecm.core.api.NuxeoException: Error on query
      	at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapper.scrollSearch(JDBCMapper.java:945)
      	at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapper.scroll(JDBCMapper.java:909)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapperConnector.doInvoke(JDBCMapperConnector.java:52)
      	at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapperConnector.lambda$doConnectAndInvoke$0(JDBCMapperConnector.java:101)
      	at org.nuxeo.runtime.transaction.TransactionHelper.runInTransaction(TransactionHelper.java:589)
      	at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapperConnector.doConnectAndInvoke(JDBCMapperConnector.java:97)
      	at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapperConnector.invoke(JDBCMapperConnector.java:92)
      	at com.sun.proxy.$Proxy65.scroll(Unknown Source)
      	at org.nuxeo.ecm.core.storage.sql.SoftRefCachingMapper.scroll(SoftRefCachingMapper.java:57)
      	at org.nuxeo.ecm.core.storage.sql.SessionImpl.scroll(SessionImpl.java:894)
      	at org.nuxeo.ecm.core.storage.sql.ra.ConnectionImpl.scroll(ConnectionImpl.java:208)
      	at org.nuxeo.ecm.core.storage.sql.coremodel.SQLSession.scroll(SQLSession.java:164)
      	at org.nuxeo.ecm.core.api.AbstractSession.scroll(AbstractSession.java:1304)
      	at org.nuxeo.elasticsearch.work.ScrollingIndexingWorker.doWork(ScrollingIndexingWorker.java:74)
      	at org.nuxeo.elasticsearch.work.BaseIndexingWorker.work(BaseIndexingWorker.java:48)
      	at org.nuxeo.ecm.core.work.AbstractWork.runWorkWithTransaction(AbstractWork.java:373)
      	at org.nuxeo.ecm.core.work.AbstractWork.run(AbstractWork.java:332)
      	... 4 more
      	Suppressed: org.nuxeo.ecm.core.api.NuxeoException: javax.resource.ResourceException: Interrupted while requesting permit.
      		at org.nuxeo.ecm.core.storage.sql.ra.ConnectionFactoryImpl.getConnection(ConnectionFactoryImpl.java:117)
      		at org.nuxeo.ecm.core.storage.sql.ra.ConnectionFactoryImpl.getSession(ConnectionFactoryImpl.java:166)
      		at org.nuxeo.ecm.core.repository.RepositoryService.getSession(RepositoryService.java:204)
      		at org.nuxeo.ecm.core.api.local.LocalSession.createSession(LocalSession.java:127)
      		at org.nuxeo.ecm.core.api.local.LocalSession.<init>(LocalSession.java:84)
      		at org.nuxeo.ecm.core.api.CoreSessionServiceImpl.createCoreSession(CoreSessionServiceImpl.java:43)
      		at org.nuxeo.ecm.core.api.CoreInstance.openCoreSession(CoreInstance.java:163)
      		at org.nuxeo.ecm.core.api.CoreInstance.openCoreSessionSystem(CoreInstance.java:114)
      		at org.nuxeo.ecm.core.work.AbstractWork.openSystemSession(AbstractWork.java:263)
      		at org.nuxeo.elasticsearch.work.ScrollingIndexingWorker.doWork(ScrollingIndexingWorker.java:72)
      		... 7 more
      	Caused by: javax.resource.ResourceException: Interrupted while requesting permit.
      		at org.apache.geronimo.connector.outbound.AbstractSinglePoolConnectionInterceptor.getConnection(AbstractSinglePoolConnectionInterceptor.java:97)
      		at org.nuxeo.runtime.jtajca.NuxeoPool$1.getConnection(NuxeoPool.java:66)
      		at org.apache.geronimo.connector.outbound.ThreadLocalCachingConnectionInterceptor.getConnection(ThreadLocalCachingConnectionInterceptor.java:70)
      		at org.apache.geronimo.connector.outbound.TransactionEnlistingInterceptor.getConnection(TransactionEnlistingInterceptor.java:49)
      		at org.apache.geronimo.connector.outbound.TransactionCachingInterceptor.getConnection(TransactionCachingInterceptor.java:101)
      		at org.apache.geronimo.connector.outbound.ConnectionHandleInterceptor.getConnection(ConnectionHandleInterceptor.java:43)
      		at org.apache.geronimo.connector.outbound.TCCLInterceptor.getConnection(TCCLInterceptor.java:39)
      		at org.apache.geronimo.connector.outbound.ConnectionTrackingInterceptor.getConnection(ConnectionTrackingInterceptor.java:66)
      		at org.apache.geronimo.connector.outbound.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:77)
      		at org.nuxeo.runtime.jtajca.NuxeoContainer$ConnectionManagerWrapper.allocateConnection(NuxeoContainer.java:764)
      		at org.nuxeo.ecm.core.storage.sql.ra.ConnectionFactoryImpl.getConnection(ConnectionFactoryImpl.java:103)
      		... 16 more
      	Caused by: java.lang.InterruptedException
      		at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
      		at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:409)
      		at org.apache.geronimo.connector.outbound.AbstractSinglePoolConnectionInterceptor.getConnection(AbstractSinglePoolConnectionInterceptor.java:78)
      		... 26 more
      Caused by: org.h2.jdbc.JdbcSQLException: General error: "java.lang.IllegalStateException: Reading from cache:nio:/opt/jenkins/workspace/TestAndPush/ondemand-testandpush-atchertchian-master-2/nuxeo-distribution/nuxeo-jsf-ui-hotreload-tests/target/tomcat/nxserver/data/h2/nuxeo.mv.db failed; file length -1 read length 768 at 6564231 [1.4.177/1]"; SQL statement:
      SELECT "_C1" FROM (SELECT "HIERARCHY"."ID" AS "_C1" FROM "HIERARCHY" WHERE (("HIERARCHY"."PRIMARYTYPE" IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)) AND (NX_IN_TREE("HIERARCHY"."ID", ?))) UNION ALL SELECT "_H"."ID" AS "_C1" FROM "HIERARCHY" "_H" JOIN "PROXIES" ON "_H"."ID" = "PROXIES"."ID" JOIN "HIERARCHY" ON "PROXIES"."TARGETID" = "HIERARCHY"."ID" WHERE (("HIERARCHY"."PRIMARYTYPE" IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)) AND (NX_IN_TREE("_H"."ID", ?)))) [50000-177]
      	at org.h2.message.DbException.getJdbcSQLException(DbException.java:344)
      	at org.h2.message.DbException.get(DbException.java:167)
      	at org.h2.message.DbException.convert(DbException.java:294)
      	at org.h2.command.Command.executeQuery(Command.java:209)
      	at org.h2.jdbc.JdbcPreparedStatement.executeQuery(JdbcPreparedStatement.java:109)
      	at org.tranql.connector.jdbc.PreparedStatementHandle.executeQuery(PreparedStatementHandle.java:52)
      	at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapper.scrollSearch(JDBCMapper.java:940)
      	... 24 more
      Caused by: java.lang.IllegalStateException: Reading from cache:nio:/opt/jenkins/workspace/TestAndPush/ondemand-testandpush-atchertchian-master-2/nuxeo-distribution/nuxeo-jsf-ui-hotreload-tests/target/tomcat/nxserver/data/h2/nuxeo.mv.db failed; file length -1 read length 768 at 6564231 [1.4.177/1]
      	at org.h2.mvstore.DataUtils.newIllegalStateException(DataUtils.java:757)
      	at org.h2.mvstore.DataUtils.readFully(DataUtils.java:427)
      	at org.h2.mvstore.FileStore.readFully(FileStore.java:99)
      	at org.h2.mvstore.Page.read(Page.java:195)
      	at org.h2.mvstore.MVStore.readPage(MVStore.java:1615)
      	at org.h2.mvstore.MVMap.readPage(MVMap.java:733)
      	at org.h2.mvstore.Page.getChildPage(Page.java:222)
      	at org.h2.mvstore.MVMap.binarySearch(MVMap.java:446)
      	at org.h2.mvstore.MVMap.get(MVMap.java:428)
      	at org.h2.mvstore.db.TransactionStore$TransactionMap.getValue(TransactionStore.java:1186)
      	at org.h2.mvstore.db.TransactionStore$TransactionMap.get(TransactionStore.java:1164)
      	at org.h2.mvstore.db.TransactionStore$TransactionMap.get(TransactionStore.java:1131)
      	at org.h2.mvstore.db.MVPrimaryIndex.getRow(MVPrimaryIndex.java:207)
      	at org.h2.mvstore.db.MVTable.getRow(MVTable.java:367)
      	at org.h2.mvstore.db.MVSecondaryIndex$MVStoreCursor.get(MVSecondaryIndex.java:491)
      	at org.h2.index.IndexCursor.get(IndexCursor.java:261)
      	at org.h2.table.TableFilter.getValue(TableFilter.java:898)
      	at org.h2.expression.ExpressionColumn.getValue(ExpressionColumn.java:187)
      	at org.h2.engine.FunctionAlias$JavaMethod.getValue(FunctionAlias.java:426)
      	at org.h2.expression.JavaFunction.getValue(JavaFunction.java:39)
      	at org.h2.expression.ConditionAndOr.getValue(ConditionAndOr.java:94)
      	at org.h2.expression.Expression.getBooleanValue(Expression.java:179)
      	at org.h2.command.dml.Select.queryFlat(Select.java:535)
      	at org.h2.command.dml.Select.queryWithoutCache(Select.java:646)
      	at org.h2.command.dml.Query.query(Query.java:323)
      	at org.h2.command.dml.Query.query(Query.java:291)
      	at org.h2.command.dml.SelectUnion.queryWithoutCache(SelectUnion.java:194)
      	at org.h2.command.dml.SelectUnion.query(SelectUnion.java:424)
      	at org.h2.command.dml.Query.query(Query.java:291)
      	at org.h2.index.ViewIndex.find(ViewIndex.java:283)
      	at org.h2.index.ViewIndex.find(ViewIndex.java:179)
      	at org.h2.index.BaseIndex.find(BaseIndex.java:128)
      	at org.h2.index.IndexCursor.find(IndexCursor.java:160)
      	at org.h2.table.TableFilter.next(TableFilter.java:330)
      	at org.h2.command.dml.Select.queryFlat(Select.java:533)
      	at org.h2.command.dml.Select.queryWithoutCache(Select.java:646)
      	at org.h2.command.dml.Query.query(Query.java:323)
      	at org.h2.command.dml.Query.query(Query.java:291)
      	at org.h2.command.dml.Query.query(Query.java:37)
      	at org.h2.command.CommandContainer.query(CommandContainer.java:91)
      	at org.h2.command.Command.executeQuery(Command.java:197)
      	... 27 more
      Caused by: java.nio.channels.ClosedByInterruptException
      	at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
      	at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:746)
      	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
      	at org.h2.store.fs.FileNio.read(FilePathNio.java:75)
      	at org.h2.mvstore.cache.FilePathCache$FileCache.read(FilePathCache.java:82)
      	at org.h2.mvstore.DataUtils.readFully(DataUtils.java:413)
      	... 66 more
      

      This shows that we retry to run the work whereas H2 threw an exception closed to an interrupt one.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 6 hours
                  6h