Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-15826

Improve Elasticsearch reindex all repository

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.8.0-HF24, 6.0
    • Fix Version/s: 5.8.0-HF28, 6.0-HF02, 7.1
    • Component/s: Elasticsearch

      Description

      When reindexing the repository we recursively index document starting from the root document.
      Some documents have no parentid and are not reindexed (Tag, Tagging, DefaultRelation, versions ...)

      Reindexing from a root docid is interesting to update a part of the repository,
      but to reindex all the repository we should proceed differently.

      New implementation:
      A Scrolling worker get the list of document ids matching a NXQL query. This worker split the list in bucket and launch a Bucket worker.
      The Bucket woker submit documents to Elasticsearch in bulk mode.

      The default size of the bucket is 500, this can be tuned using elasticsearch.reindex.bucketReadSize
      The default size of the number of document in the bulk command is 50, this can be tuned using elasticsearch.reindex.bucketWriteSize

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: