-
Type: Sub-task
-
Status: Resolved
-
Priority: Minor
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: ADDONS_9.10
-
Component/s: Clustering
For people deploying in multi-datacenters, we may be able to assert that the infrastructure will provide a way to sync blobs.
However, if this is not the case, we can consider several options:
RSync
Principles
We run rsync between 2 nodes on the 2 DC each of then being connected to the underlying filesystem that is used by the BlobManager.
Advantages
Almost nothing to do.
Limitations
Hard to know "where we are in the replication process" and hence hard to know what will be the amount of list data.
Not sure how it will scale with a very large storage with several millions of files.
BlobManager with async remote write
Principles
We write a BlobManager that will also write, asynchronously, the blobs to the DRP location.
This is a "double write" approach.
Advantages
No external process to run.
Limitations
Handling the async write may be complex: it will be easy with a S3 client, but on the other hand, if we use S3 on AWS, we could simply use built-in features or an event triggered Lambda.
BlobManager + nuxeo-stream + copy process
Principles
We can do use the same principle as the one used for ES:
- have a custom Binary Manager that
- in addition to writing the Blob also add an entry in nuxeo-stream (just the digest of the Blob to be added)
- expose an http API to retrieve the Blob based on digest
- have a Kafka Connect running on the target DC
- get the digest from Kafka
- fetch the remote Blob and write it
Advantages
- we can track progress
- the process is automatic and reliable
Limitations
More custom code to deploy and maintain.