[NXDRIVE-2035] Provide a new drive synchronization using streams/push/ - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Epic
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Synchronizer

Description

https://docs.google.com/presentation/d/15rPZQZVTY9d8iVdiV50ViPSnwAvpd36EdsloZptRabY/edit?usp=sharing

-------------------------------------------------------------------------

It seems that we have not made much progress on this topic.

Reading the Slides that Yannis put together as a summary of the discussions we had in Paris, I am not sure we captured everything, so it may make sense to initialize a proper spec.

Sharing below the notes I made prior to the meeting in Paris: hopefully, this can be used as a starting point.
As already discussed, it would be great to draft a first architecture and discuss it together.

Goals

Scalability

Currently, the Drive server-side workload directly increases with both the number of documents synchronized and the number of connected customers.

If we want Drive to be usable at scale, we need to change this model.

Independence

Currently, the Drive team's skills are mainly in Python whereas the Drive Server side is in pure Java.

This makes it more difficult for the Drive team to efficiently address the performance issue since they always rely on another team to do changes and adjustments on the server-side.

Strategy

Push vs Pull

In the current model, all Drives clients are polling the server on a regular basis to fetch the changes.

By definition, adding clients or adding documents will increase the server-side load.

In order to have a workload that we can control, we need to reverse the model and rely on push:

define what resources we want to allocate to Drive processing
compute the changeset using the allocated resources
push the updates to the client when the computation is done

Server-side python

The idea is to have a server-side python service that will handle the drive processing.

This python service can:

leverage Kafka/Even bus to be notified when something changes
leverage Nuxeo API to get information about users and synchronization settings
leverage python web stack to expose Push (WebSocket or SSE)

Once the initial infrastructure is in place, the Drive team should be able to iterate on the python service without having to rely on other teams.
In addition, our current Tomcat stack has some limitations when it comes to WebSockets and SSE, so having a dedicated python web stack to hangle that could make a lot of sense.

10,000 feet architecture

Diagram

Here is a very naive architecture diagram

Building blocks

Pre-Processor
We probably will not want to directly plug on the default event stream since only a part of the events will be interesting for Drive.
We may also define a dedicated message format so that we "bake" inside the message most of the information that drive will need in order to avoid too many back/forth.

Drive Consumer
We should be able to consume the messages using only Avro and the Kafka API.
The python consumer can then leverage the Nuxeo API to fetch the additional data needed and store the resulting update messages in a dedicated storage.

Storage
We are likely to need a dedicated storage to:

store the updates that we can not send to clients
- i.e. client is offline when the event is received
store the updates while we wait for having enough content before sending a push
- i.e. see if other messages arrive in the next 30s
store the registration status of the drive devices

NB: this storage may not always need to be persistent on disk

Endpoint
We need to check how well WebSocket and SSE are supported by existing python web stacks: my understanding is that it is anyway much better than the support we have in Tomcat.
We may also want to consider using some AWS Services that could do most of the heavy lifting for us: the difficulty of scaling a SSE infrastructure should not be under-estimated

References

Notes about the Push and SSE work started in 2018

Presentation about Zuul that we already discussed together

Attachments

Activity

People

Assignee:

Unassigned

Reporter:

Yannis Achour

Participants:

Yannis Achour

Votes:

0 Vote for this issue

Watchers:

1 Start watching this issue

Dates

Created:

2020-02-04 11:03

Updated:

2021-07-02 08:13