-
Type: Bug
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 10.10
-
Fix Version/s: 10.10-HF44, 11.x, 2021.19
-
Component/s: Scheduler
-
Release Notes Summary:Delay Quartz scheduler start for 5s by default.
-
Tags:
-
Backlog priority:800
-
Sprint:nxsupport 14, nxplatform #58, nxplatform #59
-
Story Points:1
This problem has been found in a K8s environment with 6 worker nodes (actually 2 should be enough to reproduce) when the lag (response time) with the PostgreSQL database could provoke unique key constraint violation when Quartz scheduler is starting.
The analysis of Nuxeo traces and PostgreSQL query logs showed that 2 concurrent threads are running queries during the initialization the Quartz data and trying to both acquire a TRIGGER_ACCESS lock on the table qrtz_LOCKS
Caused by: org.postgresql.util.PSQLException: ERROR: current transaction is aborted, commands ignored until end of transaction block at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:143) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:106) ~[postgresql-42.2.5.jar:42.2.5] at org.tranql.connector.jdbc.PreparedStatementHandle.executeQuery(PreparedStatementHandle.java:52) ~[tranql-connector-1.8.jar:1.8] at org.quartz.impl.jdbcjobstore.StdRowLockSemaphore.executeSQL(StdRowLockSemaphore.java:96) ~[quartz-2.3.0.jar:?] ... 59 more Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "qrtz_locks_pkey" Detail: Key (sched_name, lock_name)=(Quartz, TRIGGER_ACCESS) already exists. at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:143) ~[postgresql-42.2.5.jar:42.2.5] at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:120) ~[postgresql-42.2.5.jar:42.2.5] at org.tranql.connector.jdbc.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:186) ~[tranql-connector-1.8.jar:1.8] at org.quartz.impl.jdbcjobstore.StdRowLockSemaphore.executeSQL(StdRowLockSemaphore.java:108) ~[quartz-2.3.0.jar:?] ... 59 more
A simple solution is to delay the start of Quartz scheduler.
Note that this issue is different from problems with concurrent startups where Nuxeo has to implement a cluster-wide lock. However it's possible that the cluster-wide lock will also be required .
- is related to
-
NXP-30970 QuartzScheduler unable to start creating build failure during ftest
- Resolved
- Is referenced in