[NXP-27675] Fix default retry policy for AuditWriter - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 10.10, 11.1-SNAPSHOT
Fix Version/s: 10.10-HF10, 11.1, 2021.0
Component/s: Streams

Release Notes Description:

Hide

Increase shortage tolerance from 7s to 15min by default.

Show
Increase shortage tolerance from 7s to 15min by default.
Epic Link:
Resiliency
Tags:
Sprint:
nxplatform 11.1.13
Story Points:
2

Description

Since ~~NXP-25312~~ (Nuxeo 10.3) computations have a retry policy.
The policy for the audit log writer computation is:

maxRetries="3" delay="1s" maxDelay="10s" continueOnFailure="false"

Which means 3 retries with 1s exponential backoff delay up to 10s delay, so delays are 1, 2 and 4 seconds, or:

t: failure
t+1s: retry 1
t+3s: retry 2
t+7s: retry 3

With this configuration the tolerance is 7 seconds shortage, after this, the processor stop and a manual restart is required to resume activity.
There is no good reason to not tolerate a 15min shortage by default.

This could be done like this:

<policy name="AuditLogWriter" ...
maxRetries="20" delay="1s" maxDelay="60s" continueOnFailure="false" />

time between retries for the 10 first retries
1, 2, 4, 8, 16, 32, 60, 60, 60, 60 -> 5min05
then 10*60 -> 10min
tolerance: 15min05

The Elasticsearch re-index bulk action can also benefit to use this retry policy to be able to support Elasticsearch failure.

Note that for Nuxeo 9.10 there is no retry policy mechanism, a computation in failure stops the processing and requires a manual restart.

Attachments

Activity

People

Assignee:

Benoit Delbosc

Reporter:

Benoit Delbosc

Participants:

Benoit Delbosc, Jenkins

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

2019-07-02 08:19

Updated:

2020-12-17 16:33

Resolved:

2019-07-05 15:16

Time Tracking

Estimated:

Remaining:

Logged: