-
Type: Epic
-
Status: Open
-
Priority: Minor
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: Synchronizer
https://docs.google.com/presentation/d/15rPZQZVTY9d8iVdiV50ViPSnwAvpd36EdsloZptRabY/edit?usp=sharing
-------------------------------------------------------------------------
It seems that we have not made much progress on this topic.
Reading the Slides that Yannis put together as a summary of the discussions we had in Paris, I am not sure we captured everything, so it may make sense to initialize a proper spec.
Sharing below the notes I made prior to the meeting in Paris: hopefully, this can be used as a starting point.
As already discussed, it would be great to draft a first architecture and discuss it together.
Goals
Scalability
Currently, the Drive server-side workload directly increases with both the number of documents synchronized and the number of connected customers.
If we want Drive to be usable at scale, we need to change this model.
Independence
Currently, the Drive team's skills are mainly in Python whereas the Drive Server side is in pure Java.
This makes it more difficult for the Drive team to efficiently address the performance issue since they always rely on another team to do changes and adjustments on the server-side.
Strategy
Push vs Pull
In the current model, all Drives clients are polling the server on a regular basis to fetch the changes.
By definition, adding clients or adding documents will increase the server-side load.
In order to have a workload that we can control, we need to reverse the model and rely on push:
- define what resources we want to allocate to Drive processing
- compute the changeset using the allocated resources
- push the updates to the client when the computation is done
Server-side python
The idea is to have a server-side python service that will handle the drive processing.
This python service can:
- leverage Kafka/Even bus to be notified when something changes
- leverage Nuxeo API to get information about users and synchronization settings
- leverage python web stack to expose Push (WebSocket or SSE)
Once the initial infrastructure is in place, the Drive team should be able to iterate on the python service without having to rely on other teams.
In addition, our current Tomcat stack has some limitations when it comes to WebSockets and SSE, so having a dedicated python web stack to hangle that could make a lot of sense.
10,000 feet architecture
Diagram
Here is a very naive architecture diagram
Building blocks
Pre-Processor
We probably will not want to directly plug on the default event stream since only a part of the events will be interesting for Drive.
We may also define a dedicated message format so that we "bake" inside the message most of the information that drive will need in order to avoid too many back/forth.
Drive Consumer
We should be able to consume the messages using only Avro and the Kafka API.
The python consumer can then leverage the Nuxeo API to fetch the additional data needed and store the resulting update messages in a dedicated storage.
Storage
We are likely to need a dedicated storage to:
- store the updates that we can not send to clients
- i.e. client is offline when the event is received
- store the updates while we wait for having enough content before sending a push
- i.e. see if other messages arrive in the next 30s
- store the registration status of the drive devices
NB: this storage may not always need to be persistent on disk
Endpoint
We need to check how well WebSocket and SSE are supported by existing python web stacks: my understanding is that it is anyway much better than the support we have in Tomcat.
We may also want to consider using some AWS Services that could do most of the heavy lifting for us: the difficulty of scaling a SSE infrastructure should not be under-estimated