Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-22110

Provides an Elasticsearch indexing impl with Computations

    XMLWordPrintable

    Details

      Description

      Goal:
      1. Improve indexing throughput by merging indexing command between thread
      2. Be able to replicate a stream of elasticsearch update, so it can be used to sync a remote ES cluster

      HowTo:
      The Nuxeo indexing commands are written to a stream.
      The following computation are run:

      • convert command into list of document ids to index
      • batch and remove duplicate, fetch the documents and create an elasticsearch bulk payload
      • read the bulk payload and send them to ES

      Improvement:

      • use another topo for sync indexing but share the same final computation
      • remove the logic of dedup in the sync listener and move it to the computation.
      • remove all indexing worker
      • for the last step do a sharding per json size, so that big documents are send to the same partition, this way small documents are not blocked by slow indexing command the results is that small documents are indexed in priority with multiple partitions, big documents are indexed in background.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bdelbosc Benoit Delbosc
                Reporter:
                bdelbosc Benoit Delbosc
                Participants:
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: