[NXP-22110] Provides an Elasticsearch indexing impl with Computations - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Task
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: QualifiedToSchedule
Component/s: Elasticsearch, Streams

Epic Link:
Async Infra - Plug Indexation
Tags:
- nxplatform
Story Points:
13

Description

Goal:
1. Improve indexing throughput by merging indexing command between thread
2. Be able to replicate a stream of elasticsearch update, so it can be used to sync a remote ES cluster

HowTo:
The Nuxeo indexing commands are written to a stream.
The following computation are run:

convert command into list of document ids to index
batch and remove duplicate, fetch the documents and create an elasticsearch bulk payload
read the bulk payload and send them to ES

Improvement:

use another topo for sync indexing but share the same final computation
remove the logic of dedup in the sync listener and move it to the computation.
remove all indexing worker
for the last step do a sharding per json size, so that big documents are send to the same partition, this way small documents are not blocked by slow indexing command the results is that small documents are indexed in priority with multiple partitions, big documents are indexed in background.

Attachments

Issue Links

is related to

NXP-24335 BAF: Resilient Bulk Actions Framework

Resolved

NXP-26032 Create a Bulk Action for indexing documents

Resolved

NXP-24319 Dedicated background/slow work queues

Resolved

Activity

People

Assignee:

Benoit Delbosc

Reporter:

Benoit Delbosc

Participants:

Benoit Delbosc

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

2017-04-11 09:41

Updated:

2020-02-25 13:30