Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-21227

Improve Elasticsearch indexing throughput by optimizing bulk payload

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 8.10
    • Fix Version/s: 9.1
    • Component/s: Elasticsearch

      Description

      For now the bulk indexing command send to ES can be limited by a number of documents (elasticsearch.reindex.bucketWriteSize).
      But the optimal payload in size is 5-15M.

      The bulk command should be send once:

      • a number of documents is reached elasticsearch.reindex.bucketWriteSize
      • or a bulk size threshold is reached elasticsearch.index.bulkMaxSize

      This will prevent to send too big indexing bulk command that overwhelm ES.

      Other improvements (not covered in this ticket) could be:

      • send bulk command if the time to build the bulk command is longer than a timeout, to prevent long running transaction
      • reschedule a new job after this timeout to prevent blocking the indexing chain.

      This requires a concrete case where ES indexing is a bottleneck before impl these last improvements.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 1 day
                1d