-
Type: Bug
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 11.1-SNAPSHOT
-
Fix Version/s: 10.10-HF14, 11.1, 2021.0
-
Component/s: Streams
-
Tags:
-
Team:PLATFORM
-
Sprint:nxplatform 11.1.17
-
Story Points:2
A computation that handles document must define a retry policy that catches ConcurrentUpdateException in order to effectively apply the retry.
It is not enough to define a number of retries if the policy doesn't list the exceptions to catch,
for instance, the following computation defines a maxRetries=3:
<streamProcessor name="setProperties" class="org.nuxeo.ecm.core.bulk.action.SetPropertiesAction" logConfig="bulk" defaultConcurrency="2" defaultPartitions="2"> <policy name="default" maxRetries="3" delay="500ms" maxDelay="10s" continueOnFailure="false" /> </streamProcessor>
But the default policy doesn't list any Exception to handle, the result is that the computation halt on the first error.
We should add a policy.retryOn(ConcurrentUpdateException.class).
This has been seen on CI benchmarks with a SQL backend while the BulkUpdateFolder simulation is using the Bulk Service to update the description of a folder in a concurrent way. It is possible that concurrently 2 or more Bulk commands update the description of the same document, so concurrent exceptions are expected.
In this case we see that only one error is logged:
2019-08-30T11:38:00,494 ERROR [setPropertiesPool-03,in:251,inCheckpoint:251,out:251,lastRead:1567164640505,lastTimer:0,wm:205411324003483649,loop:17204,checkpoint] [org.nuxeo.ecm.core.bulk.action.computation.AbstractBulkComputation] Actio n: setProperties fails on record: setProperties-07:+77902116814973 after retries. org.nuxeo.ecm.core.api.ConcurrentUpdateException: Concurrent update at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCConnection.checkConcurrentUpdate(JDBCConnection.java:183) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCRowMapper.insertCollectionRows(JDBCRowMapper.java:569) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCRowMapper.updateCollectionRows(JDBCRowMapper.java:667) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCRowMapper.writeUpdates(JDBCRowMapper.java:485) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCRowMapper.write(JDBCRowMapper.java:442) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] at jdk.internal.reflect.GeneratedMethodAccessor183.invoke(Unknown Source) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapperConnector.doInvoke(JDBCMapperConnector.java:111) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapperConnector.lambda$doConnectAndInvoke$0(JDBCMapperConnector.java:98) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] at org.nuxeo.runtime.transaction.TransactionHelper.runInTransaction(TransactionHelper.java:603) ~[nuxeo-runtime-jtajca-11.1-SNAPSHOT.jar:?] at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapperConnector.doConnectAndInvoke(JDBCMapperConnector.java:95) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapperConnector.invoke(JDBCMapperConnector.java:84) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] at com.sun.proxy.$Proxy71.write(Unknown Source) ~[?:?] at org.nuxeo.ecm.core.storage.sql.SoftRefCachingRowMapper.write(SoftRefCachingRowMapper.java:362) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] at org.nuxeo.ecm.core.storage.sql.SessionImpl.doFlush(SessionImpl.java:359) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] at org.nuxeo.ecm.core.storage.sql.SessionImpl.flush(SessionImpl.java:331) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] at org.nuxeo.ecm.core.storage.sql.SessionImpl.end(SessionImpl.java:1347) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] at org.apache.geronimo.transaction.manager.WrapperNamedXAResource.end(WrapperNamedXAResource.java:61) ~[geronimo-transaction-2.2.1.jar:2.2.1] at org.apache.geronimo.transaction.manager.TransactionImpl.endResources(TransactionImpl.java:569) ~[geronimo-transaction-2.2.1.jar:2.2.1] at org.apache.geronimo.transaction.manager.TransactionImpl.endResources(TransactionImpl.java:548) ~[geronimo-transaction-2.2.1.jar:2.2.1] at org.apache.geronimo.transaction.manager.TransactionImpl.beforePrepare(TransactionImpl.java:408) ~[geronimo-transaction-2.2.1.jar:2.2.1] at org.apache.geronimo.transaction.manager.TransactionImpl.commit(TransactionImpl.java:255) ~[geronimo-transaction-2.2.1.jar:2.2.1] at org.apache.geronimo.transaction.manager.TransactionManagerImpl.commit(TransactionManagerImpl.java:252) ~[geronimo-transaction-2.2.1.jar:2.2.1] at org.nuxeo.runtime.jtajca.NuxeoContainer$UserTransactionImpl.commit(NuxeoContainer.java:538) ~[nuxeo-runtime-jtajca-11.1-SNAPSHOT.jar:?] at org.nuxeo.runtime.transaction.TransactionHelper.commitOrRollbackTransaction(TransactionHelper.java:389) ~[nuxeo-runtime-jtajca-11.1-SNAPSHOT.jar:?] at org.nuxeo.runtime.transaction.TransactionHelper.runInTransaction(TransactionHelper.java:613) ~[nuxeo-runtime-jtajca-11.1-SNAPSHOT.jar:?] at org.nuxeo.runtime.transaction.TransactionHelper.runInTransaction(TransactionHelper.java:583) ~[nuxeo-runtime-jtajca-11.1-SNAPSHOT.jar:?] at org.nuxeo.ecm.core.bulk.action.computation.AbstractBulkComputation.processBatchOfDocuments(AbstractBulkComputation.java:137) ~[nuxeo-core-bulk-11.1-SNAPSHOT.jar:?] at org.nuxeo.ecm.core.bulk.action.computation.AbstractBulkComputation.processRecord(AbstractBulkComputation.java:99) ~[nuxeo-core-bulk-11.1-SNAPSHOT.jar:?] at org.nuxeo.lib.stream.computation.log.ComputationRunner.lambda$processRecordWithRetry$10(ComputationRunner.java:366) ~[nuxeo-stream-11.1-SNAPSHOT.jar:?] at net.jodah.failsafe.Functions$10.call(Functions.java:252) ~[failsafe-1.1.0.jar:1.1.0] at net.jodah.failsafe.SyncFailsafe.call(SyncFailsafe.java:145) [failsafe-1.1.0.jar:1.1.0] at net.jodah.failsafe.SyncFailsafe.run(SyncFailsafe.java:81) [failsafe-1.1.0.jar:1.1.0] at org.nuxeo.lib.stream.computation.log.ComputationRunner.processRecordWithRetry(ComputationRunner.java:366) [nuxeo-stream-11.1-SNAPSHOT.jar:?] at org.nuxeo.lib.stream.computation.log.ComputationRunner.processRecord(ComputationRunner.java:349) [nuxeo-stream-11.1-SNAPSHOT.jar:?] at org.nuxeo.lib.stream.computation.log.ComputationRunner.processLoop(ComputationRunner.java:239) [nuxeo-stream-11.1-SNAPSHOT.jar:?] at org.nuxeo.lib.stream.computation.log.ComputationRunner.run(ComputationRunner.java:184) [nuxeo-stream-11.1-SNAPSHOT.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:834) [?:?] Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "dc_contributors_unique_pos" Detail: Key (id, pos)=(9d00bfd1-1b11-40d1-bda5-a47d52b71eda, 1) already exists. at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:143) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:132) ~[postgresql-42.2.5.jar:42.2.5] at org.tranql.connector.jdbc.PreparedStatementHandle.execute(PreparedStatementHandle.java:177) ~[tranql-connector-1.8.jar:1.8] at org.nuxeo.ecm.core.storage.sql.jdbc.ScalarCollectionIO.executeInserts(ScalarCollectionIO.java:122) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCRowMapper.insertCollectionRows(JDBCRowMapper.java:567) ~[nuxeo-core-storage-sql-11.1-SNAPSHOT.jar:?] ... 41 more 2019-08-30T11:38:00,509 ERROR [setPropertiesPool-03,in:251,inCheckpoint:251,out:251,lastRead:1567164640505,lastTimer:0,wm:205411324003483649,loop:17204,checkpoint] [org.nuxeo.lib.stream.computation.log.ComputationRunner] Terminate computation: setProperties due to previous failure
When not using Kafka this slow down the benchmark waiting for 1h timeout request.