Uploaded image for project: 'Nuxeo Drive '
  1. Nuxeo Drive
  2. NXDRIVE-2035

Provide a new drive synchronization using streams/push/

    XMLWordPrintable

    Details

    • Type: Epic
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Synchronizer

      Description

      https://docs.google.com/presentation/d/15rPZQZVTY9d8iVdiV50ViPSnwAvpd36EdsloZptRabY/edit?usp=sharing

       
      -------------------------------------------------------------------------

      It seems that we have not made much progress on this topic.

      Reading the Slides that Yannis put together as a summary of the discussions we had in Paris, I am not sure we captured everything, so it may make sense to initialize a proper spec.

      Sharing below the notes I made prior to the meeting in Paris: hopefully, this can be used as a starting point.
      As already discussed, it would be great to draft a first architecture and discuss it together.

      Goals

      Scalability

      Currently, the Drive server-side workload directly increases with both the number of documents synchronized and the number of connected customers.

      If we want Drive to be usable at scale, we need to change this model.

      Independence

      Currently, the Drive team's skills are mainly in Python whereas the Drive Server side is in pure Java.

      This makes it more difficult for the Drive team to efficiently address the performance issue since they always rely on another team to do changes and adjustments on the server-side.

      Strategy

      Push vs Pull

      In the current model, all Drives clients are polling the server on a regular basis to fetch the changes.

      By definition, adding clients or adding documents will increase the server-side load.

      In order to have a workload that we can control, we need to reverse the model and rely on push:

      • define what resources we want to allocate to Drive processing
      • compute the changeset using the allocated resources
      • push the updates to the client when the computation is done
      Server-side python

      The idea is to have a server-side python service that will handle the drive processing.

      This python service can:

      • leverage Kafka/Even bus to be notified when something changes
      • leverage Nuxeo API to get information about users and synchronization settings
      • leverage python web stack to expose Push (WebSocket or SSE)

      Once the initial infrastructure is in place, the Drive team should be able to iterate on the python service without having to rely on other teams.
      In addition, our current Tomcat stack has some limitations when it comes to WebSockets and SSE, so having a dedicated python web stack to hangle that could make a lot of sense.

      10,000 feet architecture

      Diagram

      Here is a very naive architecture diagram

      Building blocks

      Pre-Processor
      We probably will not want to directly plug on the default event stream since only a part of the events will be interesting for Drive.
      We may also define a dedicated message format so that we "bake" inside the message most of the information that drive will need in order to avoid too many back/forth.

      Drive Consumer
      We should be able to consume the messages using only Avro and the Kafka API.
      The python consumer can then leverage the Nuxeo API to fetch the additional data needed and store the resulting update messages in a dedicated storage.

      Storage
      We are likely to need a dedicated storage to:

      • store the updates that we can not send to clients
        • i.e. client is offline when the event is received
      • store the updates while we wait for having enough content before sending a push
        • i.e. see if other messages arrive in the next 30s
      • store the registration status of the drive devices

      NB: this storage may not always need to be persistent on disk

      Endpoint
      We need to check how well WebSocket and SSE are supported by existing python web stacks: my understanding is that it is anyway much better than the support we have in Tomcat.
      We may also want to consider using some AWS Services that could do most of the heavy lifting for us: the difficulty of scaling a SSE infrastructure should not be under-estimated

      References

      Notes about the Push and SSE work started in 2018

      Presentation about Zuul that we already discussed together

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              yachour Yannis Achour
              Participants:
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: