• Type: Sub-task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: ADDONS_9.10
    • Component/s: Clustering


      1. Principle

      The goal is to build a pipeline to replicate elasticsearch indexes between 2 remote Nuxeo instances.

      The pipe uses Kafka as a vehicle to transmit operations between the source and the destination Nuxeo instance.

      This pipe is composed of 2 parts:

      • a part that logs Nuxeo ES related operation inside Kafka
      • this part is a Nuxeo Plugin
      • a part that moves data from Kafka to ES
      • this part if a custom kafka connect
        1. Why not simply using LogStash

      As a first approach, we could consider using LogStash

          ES1 --LogStach--> Kafka DC1 --MirrorMaker--> Kafka DC2 --LogStach--> ES2

      Creating a LogStash pipe with Elasticsearch and Kafka is rather simple since ES and Kafka are both standard Input/Output plugins.

      However, the ES input plugin has a major drawback: it is configured using an ES query and simply pipes all the JSON documents into the output plugin.

      As a result, this will work for an initial replication, but then since I did not find a "tail mode", we would have to re-rerun periodically the LogStash pipe with a query that match only the recently added/updated entries.

      While technically we could have a script that runs LogStash in a loop changing the query at each iteration, my fear is that:

      • having a reliable offset may be tricky
      • the cost on the ES side may end up being significant
      • this will not work for deletion




            • Assignee:
              tdelprat Thierry Delprat
              tdelprat Thierry Delprat
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: