[NXS-5476] Invalidate Workspace node's cache - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: Git Access

Roadmap Milestone:

NOS.infra.LTS2019
Tags:
Team:
NOS

Description

In the current deployment, each node has his own Workspace (project git clone) cache.
Each node has his own invalidation mechanism based on a poll that maintains access and cleanup depending on the usage.

It works well when all the work is done on the same node, but as we are moving to more and more to a services-based architecture, we'll see more request hitting several nodes which increases risk of hitting a node with an outdated cache.

Observations:

Git project can contain assets; which makes the clone costly.
Current workspace pool has a "dirty" state

Proposed Solutions:

Using cluster wise event/cache invalidation (codebase side) only relies on pushed changes
In order to propagate workspace pool invalidation, we could rely on our PubSub Event that sends an invalidation event to all the nodes.
It would still have drawbacks around any not pushed commits.

Shared Invalidation states with Halzecast relies on a infra update
Instead of propagate invalidations using pub sub, we could as well keeping track of the invalidation in a cluster shared object using Hazelcast.
But it would need some infra upgrade to have hazelcast setup.

Request rooting (network) hard to implement
Having some kind of session affinity for workspace access, that keeps track of which node has the current in-use workspace clone.

Shared InMemory Repository Clone (codebase side) way too experimental IMHO and repos in memory don't sound good
jGit has a special clone mode (https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/InMemoryRepository.java#L28) that would allow us to keep repository only in memory. With the help of Halzelcast to share objects in a cluster, we could easily have a cluster wise workspace pool and all the RW access would be done on the same object.
It might be a possible solution to evaluate with a more aggressive cleanup strategy, some tests to see how it impacts performance and what would be the memory footprint with our load with concurrent users.

Attachments

Issue Links

is related to

NXS-4558 NOS: Migration to Nuxeo 10.10

Resolved

Activity

People

Assignee:

Arnaud Kervern

Reporter:

Arnaud Kervern

Participants:

Arnaud Kervern, Florent Guillaume, Thierry Delprat

Votes:

0 Vote for this issue

Watchers:

4 Start watching this issue

Dates

Created:

2019-08-16 20:46

Updated:

2020-11-25 13:03

Resolved:

2020-11-25 13:03