Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-31616

Make workflow escalation check scalable

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2023.0, 2021.34
    • Component/s: Workflow
    • Release Notes Summary:
      Workflow Escalation Rules execution on Bulk Action Framework
    • Release Notes Description:
      Hide

      To improve resilience and scaling, the Workflow Escalation Rules are now performed with the Bulk Action Framework.

      You can enable back the WorkManager implementation by setting this nuxeo.conf property:

      nuxeo.document.routing.escalation.legacy=true
      
      Show
      To improve resilience and scaling, the Workflow Escalation Rules are now performed with the Bulk Action Framework. You can enable back the WorkManager implementation by setting this nuxeo.conf property: nuxeo.document.routing.escalation.legacy= true
    • Backlog priority:
      900
    • Team:
      PLATFORM
    • Sprint:
      nxplatform #80, nxplatform #81
    • Story Points:
      5

      Description

      The check should be done using a bulk action in order to handle large query results, parallelize processing and use batching to avoid transaction timeout.
      We also want to avoid concurrent executions (should be traced as a warn if a check is already in progress)
      The duration of the processing depends on the number of escalation rules to process.


      It has been reported that the Escalation Rules mechanism can fail with ~500k tasks with a condition and an action, because of a timeout.

      Given that increasing too much the transaction timeout is not a long-term option, let's take the case where we have many (1 billion) routing tasks with escalation rule condition and an associated action. And the goal is to complete the execution of the Escalation Rule service without a timeout, like below

      2022-12-17T03:01:22,075 ERROR [Quartz_Worker-1] [org.quartz.core.JobRunShell] Job nuxeo.escalationScheduler threw an unhandled Exception: 
      org.nuxeo.runtime.transaction.TransactionRuntimeException: Transaction has timed out: Tx started: 1671242481924, timeout: 1671246081924 (duration 3600s), current: 1671246082055
      	at org.nuxeo.runtime.transaction.TransactionHelper.checkTransactionTimeout(TransactionHelper.java:273) ~[nuxeo-runtime-jtajca-10.10-HF63.jar:?]
      	at org.nuxeo.ecm.core.api.local.LocalSession.getSession(LocalSession.java:108) ~[nuxeo-core-10.10-HF66.jar:?]
      	at org.nuxeo.ecm.core.api.AbstractSession.saveDocument(AbstractSession.java:1657) ~[nuxeo-core-10.10-HF66.jar:?]
      	at org.nuxeo.ecm.platform.routing.core.impl.GraphNodeImpl.saveDocument(GraphNodeImpl.java:142) ~[nuxeo-routing-core-10.10-HF67.jar:?]
      	at org.nuxeo.ecm.platform.routing.core.impl.GraphNodeImpl.evaluateEscalationRules(GraphNodeImpl.java:889) ~[nuxeo-routing-core-10.10-HF67.jar:?]
      	at org.nuxeo.ecm.platform.routing.core.impl.DocumentRoutingEscalationServiceImpl.computeEscalationRulesToExecute(DocumentRoutingEscalationServiceImpl.java:70) ~[nuxeo-routing-core-10.10-HF67.jar:?]
      	at org.nuxeo.ecm.platform.routing.core.listener.DocumentRoutingEscalationListener$1.run(DocumentRoutingEscalationListener.java:65) ~[nuxeo-routing-core-10.10-HF67.jar:?]
      	at org.nuxeo.ecm.core.api.UnrestrictedSessionRunner.runUnrestricted(UnrestrictedSessionRunner.java:137) ~[nuxeo-core-api-10.10-HF65.jar:?]
      	at org.nuxeo.ecm.platform.routing.core.listener.DocumentRoutingEscalationListener.triggerEsclationRulesExecution(DocumentRoutingEscalationListener.java:71) ~[nuxeo-routing-core-10.10-HF67.jar:?]
      	at org.nuxeo.ecm.platform.routing.core.listener.DocumentRoutingEscalationListener.handleEvent(DocumentRoutingEscalationListener.java:51) ~[nuxeo-routing-core-10.10-HF67.jar:?]
      	at org.nuxeo.ecm.core.event.impl.EventServiceImpl.fireEvent(EventServiceImpl.java:243) ~[nuxeo-core-event-10.10-HF66.jar:?]
      	at org.nuxeo.ecm.core.scheduler.EventJob.execute(EventJob.java:119) ~[nuxeo-core-event-10.10-HF66.jar:?]
      	at org.nuxeo.ecm.core.scheduler.EventJob.execute(EventJob.java:65) ~[nuxeo-core-event-10.10-HF66.jar:?]
      	at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [quartz-2.3.2.jar:?]
      	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) [quartz-2.3.2.jar:?]

      Note that a special difficulty is that the escalation rule schedule is configured to be run every 10 minutes. So the solution must either take into account parallel execution or block the other execution when one is running

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: