Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-29587

Bulk scroller should automatically turn to produceImmediate

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 10.10
    • Fix Version/s: 10.10-HF38, 11.4, 2021.0
    • Component/s: Bulk
    • Release Notes Summary:
      The Bulk scroller writes records downstream when the number of documents reaches a configured threshold.
    • Epic Link:
    • Tags:
    • Upgrade notes:
      Hide

      The scroller that materializes the document set of a bulk command will write records downstream if there are more than 1m documents, this is in order to prevent OOM, in this case, if there is a failure on the scroller there will be duplicate processing.
      This threshold of 1m can be configured using the config service:

      <!-- disable the immediate produce threshold -->
       <extension target="org.nuxeo.runtime.ConfigurationService" point="configuration">
           <property name="nuxeo.core.bulk.scroller.produceImmediateThreshold">0</property>
      
      Show
      The scroller that materializes the document set of a bulk command will write records downstream if there are more than 1m documents, this is in order to prevent OOM, in this case, if there is a failure on the scroller there will be duplicate processing. This threshold of 1m can be configured using the config service: <!-- disable the immediate produce threshold --> <extension target= "org.nuxeo.runtime.ConfigurationService" point= "configuration" > <property name= "nuxeo.core.bulk.scroller.produceImmediateThreshold" >0</property>
    • Team:
      PLATFORM
    • Sprint:
      nxplatform #21
    • Story Points:
      3

      Description

      The default configuration for the scroller is to be atomic, which means that if the scroll fails there is no downstream activity,

      The downstream records that contain the document ids are kept in memory and pushed downstream only when scrolling is completed.

      It is possible to handles millions of records this way but it will cost few GB of memory.

      This can turn into OOM for larger repository, in this case the configuration should be adapted:

       <extension target="org.nuxeo.runtime.ConfigurationService" point="configuration">
           <property name="nuxeo.core.bulk.scroller.produceImmediate">true</property>
           ...
      

      where records are produced downstream while the scroll continues, in the case of failure during scroll the action is partially processed.

      We can alleviate the problem using a limit: use an atomic behavior unless we have more than 100k docs.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 5 hours
                  5h

                    PagerDuty

                    Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.