[NXP-23972] Replicate Elasticsearch indexes between 2 Data centers - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: ADDONS_9.10
Component/s: Clustering

Description

Principle

The goal is to build a pipeline to replicate elasticsearch indexes between 2 remote Nuxeo instances.

The pipe uses Kafka as a vehicle to transmit operations between the source and the destination Nuxeo instance.

This pipe is composed of 2 parts:

a part that logs Nuxeo ES related operation inside Kafka
this part is a Nuxeo Plugin
a part that moves data from Kafka to ES
this part if a custom kafka connect

1. Why not simply using LogStash

As a first approach, we could consider using LogStash

    ES1 --LogStach--> Kafka DC1 --MirrorMaker--> Kafka DC2 --LogStach--> ES2

Creating a LogStash pipe with Elasticsearch and Kafka is rather simple since ES and Kafka are both standard Input/Output plugins.

However, the ES input plugin has a major drawback: it is configured using an ES query and simply pipes all the JSON documents into the output plugin.

As a result, this will work for an initial replication, but then since I did not find a "tail mode", we would have to re-rerun periodically the LogStash pipe with a query that match only the recently added/updated entries.

While technically we could have a script that runs LogStash in a loop changing the query at each iteration, my fear is that:

having a reliable offset may be tricky
the cost on the ES side may end up being significant
this will not work for deletion

Attachments

Activity

People

Assignee:

Thierry Delprat

Reporter:

Thierry Delprat

Participants:

Thierry Delprat

Votes:

0 Vote for this issue

Watchers:

1 Start watching this issue

Dates

Created:

2017-12-14 17:15

Updated:

2018-05-15 17:32

Resolved:

2018-05-15 17:32