Overview
Integrating Redis + Sentinel should allow the workmanager so handle graceful fallback.
This includes :
- detect Redis failover
- continue working in Memory during the fallback
- use new Redis master when elected
Nb : a custom addon will be released for 5.8
About Failover strategy
There are at leasts 2 ways for handling Redis fail-over :
Block/Retry strategy
The first option is that in case Redis is not available the code will wait and retry later.
There are 2 types of accessors :
- the job schedulers
- they will wait until Redis is available again for scheduling their job
- the jobs runners :
- they will simply wait for Redis to be back online
In addition, for job scheduling, we'll manage a TimeOut, so that if Redis master is still not available after a configured timeout, we'll raise an error and rollback the originating transaction.
This approach assumes the Redis fail-over will be relatively fast : this should be true as long as you have a large Redis cluster will a lot of nodes and that you don't experience network partitions.
Fallback scheduling strategy
An other approach is that in case Redis queuing is not available, we schedule the jobs in an in memory queue and that when Redis is back again, we empty the in memory queue before switching back to Redis.
This approach make the global system more-resilient to Redis fail-over, especially in case the fail-over is slow, however, there is a trade-off :
- when in memory queuing is used, if the Nuxeo node goes down the jobs will be lost
- for thread-safety reasons, this approach requires significant changes inside the current WorkManager code
Implementation choices
How we will move forward
In 5.8, we can not safely implement the second approach, so the strategy we are working on is :
in 5.8 :
- we implement the first approach (block)
- we upgrade Jedis client
- we push that as HotFix
in 6.0
- we start with the block strategy
- we'll see in 6.0 or 6.x if we can rework the WorkManager architecture to allow the fallback scheduling approach
- depends on
-
NXP-15160 integrate nosql unit in redis test features
- Resolved