[NXP-30233] Bulk Scroller should complete command in error when query times out - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 10.10, 11.4, 2021.1
Fix Version/s: 10.10-HF44, 11.5, 2021.2
Component/s: Bulk

Release Notes Summary:
The Bulk scroller completes a command in error when a query times out.
Tags:
- nxplatform
- platform-review
Team:
PLATFORM
Sprint:
nxplatform #30
Story Points:
3

Description

The Bulk Scroller is materializing the document set for a bulk command.

It executes an NXQL query that potentially can fail because of missing indexes. In this case, the scroller is failing and terminate, the command is assigned to another valid scroller and it fails again for the same reason.
The result is Nuxeo Stream failures which require intervention and backend overwhelm by multiple heavy queries (in the case of MongoDB queries continue to run).

Such cases are common on large repository when submitting a query with clauses using custom fields that are not indexed.

On MongoDB this will translate into a MongoSocketReadTimeoutExcepiton:

2020-03-09T08:20:01,164 WARN  [scrollerPool-00,in:0,inCheckpoint:0,out:0,lastRead:1583741639788,lastTimer:0,wm:0,loop:30,rebalance assigned] [org.nuxeo.lib.stream.computation.AbstractComputation] Computation: scroller fails last record: command-00:+139, retrying ...
com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message
	at com.mongodb.internal.connection.InternalStreamConnection.translateReadException(InternalStreamConnection.java:537) ~[mongo-java-driver-3.8.1.jar:?]
...

        at org.nuxeo.ecm.core.storage.dbs.DBSCachingRepository.scroll(DBSCachingRepository.java:390) ~[nuxeo-core-storage-dbs-10.10-HF21.jar:?]
	at org.nuxeo.ecm.core.storage.dbs.DBSSession.scroll(DBSSession.java:1865) 
...
	at org.nuxeo.ecm.core.scroll.RepositoryScroll.fetch(RepositoryScroll.java:88) ~[nuxeo-core-bulk-10.10-HF21.jar:?]
	at org.nuxeo.ecm.core.scroll.RepositoryScroll.hasNext(RepositoryScroll.java:81) ~[nuxeo-core-bulk-10.10-HF21.jar:?]
	at org.nuxeo.ecm.core.bulk.computation.BulkScrollerComputation.processRecord(BulkScrollerComputation.java:122) ~[nuxeo-core-bulk-10.10-HF21.jar:?]
	at org.nuxeo.ecm.core.bulk.computation.BulkScrollerComputation.lambda$processRecord$0(BulkScrollerComputation.java:108) ~[nuxeo-core-bulk-10.10-HF21.jar:?]
	at org.nuxeo.runtime.transaction.TransactionHelper.lambda$runInTransaction$5(TransactionHelper.java:587) ~[nuxeo-runtime-jtajca-10.10-HF14.jar:?]
	at org.nuxeo.runtime.transaction.TransactionHelper.runInTransaction(TransactionHelper.java:607) ~[nuxeo-runtime-jtajca-10.10-HF14.jar:?]
	at org.nuxeo.runtime.transaction.TransactionHelper.runInTransaction(TransactionHelper.java:587) ~[nuxeo-runtime-jtajca-10.10-HF14.jar:?]
	at org.nuxeo.ecm.core.bulk.computation.BulkScrollerComputation.processRecord(BulkScrollerComputation.java:108) ~[nuxeo-core-bulk-10.10-HF21.jar:?]
	at org.nuxeo.lib.stream.computation.log.ComputationRunner.lambda$processRecordWithRetry$10(ComputationRunner.java:366) ~[nuxeo-stream-10.10-HF17.jar:?]

The Scroller should catch this timeout and mark the command as completed in error.

This will avoid unnecessary retries that overwhelm the backend and potentially create duplicate downstream processing.

Attachments

Issue Links

Is referenced in

(2 Is referenced in)

Activity

People

Assignee:

Benoit Delbosc

Reporter:

Benoit Delbosc

Participants:

Benoit Delbosc, Jenkins, Support Tech User

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

2021-03-05 09:41

Updated:

2021-09-09 13:06

Resolved:

2021-03-19 15:22