Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-28086

Bulk Service should have an option to use an Elasticsearch scroller

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 10.10
    • Fix Version/s: 10.10-HF21, 11.1, 2021.0
    • Component/s: Streams
    • Release Notes Summary:
      Elasticsearch scroller is usable with the Bulk Service.
    • Release Notes Description:
      Hide

      The Bulk Service has now different scroller's options to materialize the document set. For now, there are 2 options: the repository scroller that uses the repository backend search and the elastic scroller that uses Elasticsearch. It is possible to configure the scroller to use at the Bulk Action level or at the Command level. The default scroller is the repository one.

      Show
      The Bulk Service has now different scroller's options to materialize the document set. For now, there are 2 options: the repository scroller that uses the repository backend search and the elastic scroller that uses Elasticsearch. It is possible to configure the scroller to use at the Bulk Action level or at the Command level. The default scroller is the repository one.
    • Backlog priority:
      700
    • Sprint:
      nxplatform 11.1.22, nxplatform 11.1.23, nxplatform 11.1.25, nxplatform 11.1.24, nxplatform 11.1.26
    • Story Points:
      1

      Description

      The Bulk Service scroller responsible to materialize the document set is using a repository query (with the scroll API to manage long-running query).
      The goal of the Bulk Service being to process massively documents it makes sense to query the repository using the backend which is the source of truth.

      For some non-critical action like CSV export or AI export, we want to be able to use Elasticsearch to materialize the document set because:

      • the query may come from an Elasticsearch Page Provider and can only be performed by Elasticsearch when using ES hints or aggregation filters.
      • the query may uses a full-text field that has been disabled at the repository level (nuxeo.vcs.fulltext.search.disabled=true) for performance reason.

      We should provide an option at the action and/or command level to choose the type of scroller to use.

       

      To set the default scroller for a Bulk Action use the defaultScroller option:

      <action name="csvExport" bucketSize="100" batchSize="50" httpEnabled="true" defaultScroller="elastic"
       validationClass="org.nuxeo.ecm.platform.csv.export.validation.CSVExportValidation"/> 

      To choose the scroller at Bulk Command level when using a Page Provider

      curl -X POST "http://localhost:8080/nuxeo/api/v1/search/pp/default_search/bulk/csvExport?scroll=elastic"  -u Administrator:Administrator

       

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 0 minutes
                  0m
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 45 minutes
                  45m