[NXP-14689] Ensure proper fallback if Redis is not available - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.0
Component/s: Redis

Description

Overview

Integrating Redis + Sentinel should allow the workmanager so handle graceful fallback.

This includes :

detect Redis failover
continue working in Memory during the fallback
use new Redis master when elected

Nb : a custom addon will be released for 5.8

About Failover strategy

There are at leasts 2 ways for handling Redis fail-over :

Block/Retry strategy

The first option is that in case Redis is not available the code will wait and retry later.

There are 2 types of accessors :

the job schedulers
- they will wait until Redis is available again for scheduling their job
the jobs runners :
- they will simply wait for Redis to be back online

In addition, for job scheduling, we'll manage a TimeOut, so that if Redis master is still not available after a configured timeout, we'll raise an error and rollback the originating transaction.

This approach assumes the Redis fail-over will be relatively fast : this should be true as long as you have a large Redis cluster will a lot of nodes and that you don't experience network partitions.

Fallback scheduling strategy

An other approach is that in case Redis queuing is not available, we schedule the jobs in an in memory queue and that when Redis is back again, we empty the in memory queue before switching back to Redis.

This approach make the global system more-resilient to Redis fail-over, especially in case the fail-over is slow, however, there is a trade-off :

when in memory queuing is used, if the Nuxeo node goes down the jobs will be lost
for thread-safety reasons, this approach requires significant changes inside the current WorkManager code
Implementation choices

How we will move forward

In 5.8, we can not safely implement the second approach, so the strategy we are working on is :

in 5.8 :

we implement the first approach (block)
we upgrade Jedis client
we push that as HotFix

in 6.0

we start with the block strategy
we'll see in 6.0 or 6.x if we can rework the WorkManager architecture to allow the fallback scheduling approach

Attachments

Issue Links

depends on

NXP-15160 integrate nosql unit in redis test features

Resolved

Activity

People

Assignee:

Stéphane Lacoin

Reporter:

Thierry Delprat

Participants:

Stéphane Lacoin, Thierry Delprat

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

2014-07-01 14:40

Updated:

2014-09-22 12:59

Resolved:

2014-09-22 12:59