Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-27979

Retry policy must handle ConcurrentUpdateException

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 11.1-SNAPSHOT
    • Fix Version/s: 10.10-HF14, 11.1, 2021.0
    • Component/s: Streams
    • Team:
      PLATFORM
    • Sprint:
      nxplatform 11.1.17
    • Story Points:
      2

      Description

      A computation that handles document must define a retry policy that catches ConcurrentUpdateException in order to effectively apply the retry.

      It is not enough to define a number of retries if the policy doesn't list the exceptions to catch,
      for instance, the following computation defines a maxRetries=3:

          <streamProcessor name="setProperties" class="org.nuxeo.ecm.core.bulk.action.SetPropertiesAction" logConfig="bulk"
            defaultConcurrency="2" defaultPartitions="2">
            <policy name="default" maxRetries="3" delay="500ms" maxDelay="10s" continueOnFailure="false" />
          </streamProcessor>
      

      But the default policy doesn't list any Exception to handle, the result is that the computation halt on the first error.
      We should add a policy.retryOn(ConcurrentUpdateException.class).

      This has been seen on CI benchmarks with a SQL backend while the BulkUpdateFolder simulation is using the Bulk Service to update the description of a folder in a concurrent way. It is possible that concurrently 2 or more Bulk commands update the description of the same document, so concurrent exceptions are expected.
      In this case we see that only one error is logged:

      2019-08-30T11:38:00,494 ERROR [setPropertiesPool-03,in:251,inCheckpoint:251,out:251,lastRead:1567164640505,lastTimer:0,wm:205411324003483649,loop:17204,checkpoint] [org.nuxeo.ecm.core.bulk.action.computation.AbstractBulkComputation] Actio
      n: setProperties fails on record: setProperties-07:+77902116814973 after retries.
      org.nuxeo.ecm.core.api.ConcurrentUpdateException: Concurrent update
              at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCConnection.checkConcurrentUpdate(JDBCConnection.java:183) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCRowMapper.insertCollectionRows(JDBCRowMapper.java:569) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCRowMapper.updateCollectionRows(JDBCRowMapper.java:667) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCRowMapper.writeUpdates(JDBCRowMapper.java:485) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCRowMapper.write(JDBCRowMapper.java:442) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              at jdk.internal.reflect.GeneratedMethodAccessor183.invoke(Unknown Source) ~[?:?]
              at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
              at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
              at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapperConnector.doInvoke(JDBCMapperConnector.java:111) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapperConnector.lambda$doConnectAndInvoke$0(JDBCMapperConnector.java:98) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.runtime.transaction.TransactionHelper.runInTransaction(TransactionHelper.java:603) ~[nuxeo-runtime-jtajca-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapperConnector.doConnectAndInvoke(JDBCMapperConnector.java:95) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapperConnector.invoke(JDBCMapperConnector.java:84) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              at com.sun.proxy.$Proxy71.write(Unknown Source) ~[?:?]
              at org.nuxeo.ecm.core.storage.sql.SoftRefCachingRowMapper.write(SoftRefCachingRowMapper.java:362) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.ecm.core.storage.sql.SessionImpl.doFlush(SessionImpl.java:359) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.ecm.core.storage.sql.SessionImpl.flush(SessionImpl.java:331) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.ecm.core.storage.sql.SessionImpl.end(SessionImpl.java:1347) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              at org.apache.geronimo.transaction.manager.WrapperNamedXAResource.end(WrapperNamedXAResource.java:61) ~[geronimo-transaction-2.2.1.jar:2.2.1]
              at org.apache.geronimo.transaction.manager.TransactionImpl.endResources(TransactionImpl.java:569) ~[geronimo-transaction-2.2.1.jar:2.2.1]
              at org.apache.geronimo.transaction.manager.TransactionImpl.endResources(TransactionImpl.java:548) ~[geronimo-transaction-2.2.1.jar:2.2.1]
              at org.apache.geronimo.transaction.manager.TransactionImpl.beforePrepare(TransactionImpl.java:408) ~[geronimo-transaction-2.2.1.jar:2.2.1]
              at org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:255) ~[geronimo-transaction-2.2.1.jar:2.2.1]
              at org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:252) ~[geronimo-transaction-2.2.1.jar:2.2.1]
              at org.nuxeo.runtime.jtajca.NuxeoContainer$UserTransactionImpl.commit(NuxeoContainer.java:538) ~[nuxeo-runtime-jtajca-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.runtime.transaction.TransactionHelper.commitOrRollbackTransaction(TransactionHelper.java:389) ~[nuxeo-runtime-jtajca-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.runtime.transaction.TransactionHelper.runInTransaction(TransactionHelper.java:613) ~[nuxeo-runtime-jtajca-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.runtime.transaction.TransactionHelper.runInTransaction(TransactionHelper.java:583) ~[nuxeo-runtime-jtajca-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.ecm.core.bulk.action.computation.AbstractBulkComputation.processBatchOfDocuments(AbstractBulkComputation.java:137) ~[nuxeo-core-bulk-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.ecm.core.bulk.action.computation.AbstractBulkComputation.processRecord(AbstractBulkComputation.java:99) ~[nuxeo-core-bulk-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.lib.stream.computation.log.ComputationRunner.lambda$processRecordWithRetry$10(ComputationRunner.java:366) ~[nuxeo-stream-11.1-SNAPSHOT.jar:?]
              at net.jodah.failsafe.Functions$10.call(Functions.java:252) ~[failsafe-1.1.0.jar:1.1.0]
              at net.jodah.failsafe.SyncFailsafe.call(SyncFailsafe.java:145) [failsafe-1.1.0.jar:1.1.0]
              at net.jodah.failsafe.SyncFailsafe.run(SyncFailsafe.java:81) [failsafe-1.1.0.jar:1.1.0]
              at org.nuxeo.lib.stream.computation.log.ComputationRunner.processRecordWithRetry(ComputationRunner.java:366) [nuxeo-stream-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.lib.stream.computation.log.ComputationRunner.processRecord(ComputationRunner.java:349) [nuxeo-stream-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.lib.stream.computation.log.ComputationRunner.processLoop(ComputationRunner.java:239) [nuxeo-stream-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.lib.stream.computation.log.ComputationRunner.run(ComputationRunner.java:184) [nuxeo-stream-11.1-SNAPSHOT.jar:?]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
              at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
              at java.lang.Thread.run(Thread.java:834) [?:?]
      Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "dc_contributors_unique_pos"
        Detail: Key (id, pos)=(9d00bfd1-1b11-40d1-bda5-a47d52b71eda, 1) already exists.
              at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440) ~[postgresql-42.2.5.jar:42.2.5]
              at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183) ~[postgresql-42.2.5.jar:42.2.5]
              at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308) ~[postgresql-42.2.5.jar:42.2.5]
              at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) ~[postgresql-42.2.5.jar:42.2.5]
              at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) ~[postgresql-42.2.5.jar:42.2.5]
              at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:143) ~[postgresql-42.2.5.jar:42.2.5]
              at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:132) ~[postgresql-42.2.5.jar:42.2.5]
              at org.tranql.connector.jdbc.PreparedStatementHandle.execute(PreparedStatementHandle.java:177) ~[tranql-connector-1.8.jar:1.8]
              at org.nuxeo.ecm.core.storage.sql.jdbc.ScalarCollectionIO.executeInserts(ScalarCollectionIO.java:122) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCRowMapper.insertCollectionRows(JDBCRowMapper.java:567) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?]
              ... 41 more
      2019-08-30T11:38:00,509 ERROR [setPropertiesPool-03,in:251,inCheckpoint:251,out:251,lastRead:1567164640505,lastTimer:0,wm:205411324003483649,loop:17204,checkpoint] [org.nuxeo.lib.stream.computation.log.ComputationRunner] Terminate computation: setProperties due to previous failure
      

      When not using Kafka this slow down the benchmark waiting for 1h timeout request.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 4 hours
                4h