-
Type: Task
-
Status: Open
-
Priority: Minor
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: QualifiedToSchedule
-
Component/s: Elasticsearch, Streams
-
Epic Link:
-
Tags:
-
Story Points:13
Goal:
1. Improve indexing throughput by merging indexing command between thread
2. Be able to replicate a stream of elasticsearch update, so it can be used to sync a remote ES cluster
HowTo:
The Nuxeo indexing commands are written to a stream.
The following computation are run:
- convert command into list of document ids to index
- batch and remove duplicate, fetch the documents and create an elasticsearch bulk payload
- read the bulk payload and send them to ES
Improvement:
- use another topo for sync indexing but share the same final computation
- remove the logic of dedup in the sync listener and move it to the computation.
- remove all indexing worker
- for the last step do a sharding per json size, so that big documents are send to the same partition, this way small documents are not blocked by slow indexing command the results is that small documents are indexed in priority with multiple partitions, big documents are indexed in background.