Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-30233

Bulk Scroller should complete command in error when query times out

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 10.10, 11.4, 2021.1
    • Fix Version/s: 10.10-HF44, 11.5, 2021.2
    • Component/s: Bulk
    • Release Notes Summary:
      The Bulk scroller completes a command in error when a query times out.
    • Team:
      PLATFORM
    • Sprint:
      nxplatform #30
    • Story Points:
      3

      Description

      The Bulk Scroller is materializing the document set for a bulk command.

      It executes an NXQL query that potentially can fail because of missing indexes. In this case, the scroller is failing and terminate, the command is assigned to another valid scroller and it fails again for the same reason.
      The result is Nuxeo Stream failures which require intervention and backend overwhelm by multiple heavy queries (in the case of MongoDB queries continue to run).

      Such cases are common on large repository when submitting a query with clauses using custom fields that are not indexed.

      On MongoDB this will translate into a MongoSocketReadTimeoutExcepiton:

      2020-03-09T08:20:01,164 WARN  [scrollerPool-00,in:0,inCheckpoint:0,out:0,lastRead:1583741639788,lastTimer:0,wm:0,loop:30,rebalance assigned] [org.nuxeo.lib.stream.computation.AbstractComputation] Computation: scroller fails last record: command-00:+139, retrying ...
      com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message
      	at com.mongodb.internal.connection.InternalStreamConnection.translateReadException(InternalStreamConnection.java:537) ~[mongo-java-driver-3.8.1.jar:?]
      ...
      
              at org.nuxeo.ecm.core.storage.dbs.DBSCachingRepository.scroll(DBSCachingRepository.java:390) ~[nuxeo-core-storage-dbs-10.10-HF21.jar:?]
      	at org.nuxeo.ecm.core.storage.dbs.DBSSession.scroll(DBSSession.java:1865) 
      ...
      	at org.nuxeo.ecm.core.scroll.RepositoryScroll.fetch(RepositoryScroll.java:88) ~[nuxeo-core-bulk-10.10-HF21.jar:?]
      	at org.nuxeo.ecm.core.scroll.RepositoryScroll.hasNext(RepositoryScroll.java:81) ~[nuxeo-core-bulk-10.10-HF21.jar:?]
      	at org.nuxeo.ecm.core.bulk.computation.BulkScrollerComputation.processRecord(BulkScrollerComputation.java:122) ~[nuxeo-core-bulk-10.10-HF21.jar:?]
      	at org.nuxeo.ecm.core.bulk.computation.BulkScrollerComputation.lambda$processRecord$0(BulkScrollerComputation.java:108) ~[nuxeo-core-bulk-10.10-HF21.jar:?]
      	at org.nuxeo.runtime.transaction.TransactionHelper.lambda$runInTransaction$5(TransactionHelper.java:587) ~[nuxeo-runtime-jtajca-10.10-HF14.jar:?]
      	at org.nuxeo.runtime.transaction.TransactionHelper.runInTransaction(TransactionHelper.java:607) ~[nuxeo-runtime-jtajca-10.10-HF14.jar:?]
      	at org.nuxeo.runtime.transaction.TransactionHelper.runInTransaction(TransactionHelper.java:587) ~[nuxeo-runtime-jtajca-10.10-HF14.jar:?]
      	at org.nuxeo.ecm.core.bulk.computation.BulkScrollerComputation.processRecord(BulkScrollerComputation.java:108) ~[nuxeo-core-bulk-10.10-HF21.jar:?]
      	at org.nuxeo.lib.stream.computation.log.ComputationRunner.lambda$processRecordWithRetry$10(ComputationRunner.java:366) ~[nuxeo-stream-10.10-HF17.jar:?]

       

      The Scroller should catch this timeout and mark the command as completed in error.

      This will avoid unnecessary retries that overwhelm the backend and potentially create duplicate downstream processing.

       

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: