[NXP-27047] Improve resilience of Nuxeo in case of infrastructure failures, namely ES erratic response times - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Won't Fix
Affects Version/s: 9.10
Fix Version/s: None
Component/s: Audit, Elasticsearch
Environment:
Drive <-> Nuxeo <-> ES (audit)

Tags:
- SupCom
- nxcore
Backlog priority:
800

Description

When the audit is stored in ElasticSearch, if ES exhibits out of a sudden very large response times (1000+ times larger than usual), this can break the Nuxeo internal audit pipeline. Restoring the audit pipeline requires a Nuxeo restart.

The request is to fix this behavior so that the pipeline can restore on its own without requiring a full Nuxeo node restart.

The fact that Nuxeo audit can stop working has consequences e.g. with Drive clients synchronization mechanism, which can be stopped as a consequence, making this a very visible failure.

Attachments

Issue Links

is related to

NXP-25312 Add a retry policy to Stream Computation

Resolved

NXP-25341 Use new batch retry computation for audit writer

Resolved

Activity

People

Assignee:

Unassigned

Reporter:

Patrick Abgrall

Participants:

Benoit Delbosc, Patrick Abgrall

Votes:

1 Vote for this issue

Watchers:

6 Start watching this issue

Dates

Created:

2019-03-20 14:18

Updated:

2019-06-18 07:31

Resolved:

2019-06-17 09:48