[NXP-27471] Expose stream processor failures as metrics - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 10.10
Fix Version/s: 11.1, 2021.0
Component/s: Streams

Epic Link:
Resiliency
Tags:
- nxcore
- nxplatform
Sprint:
nxplatform 11.1.11
Story Points:
3

Description

Since ~~NXP-27164~~ there is a probe to report stream processor failure. Because probes are used through the runningstatus as health status, the result is that a record that creates a systematic error on the processing will block the entire system.
This can be mitigated by using proper retry policy for a temporary failure (service unavailable or in failure that requires human intervention) but this is problematic for a buggy record that creates a systematic error.
So instead of activating a probe for the stream processor, we could have metrics on processing in error that can be used as a warning in a monitoring dashboard.
This way the ops can choose when to restart Nuxeo node instead of having them automatically blacklisted or restarted.

The solution is to expose a counter metric when the processing enters in termination due to error, also even if the probe is disabled it will be nice to have the stream processor probe output to list which processing is failing.

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

Grafana Stream failure counter.png
2019-06-10 08:50
3 kB
Benoit Delbosc
screenshot-1.png
2019-06-10 08:51
17 kB
Benoit Delbosc
screenshot-2.png
2019-06-10 08:53
21 kB
Benoit Delbosc
screenshot-3.png
2019-06-10 08:54
30 kB
Benoit Delbosc
screenshot-4.png
2019-06-10 08:55
45 kB
Benoit Delbosc

Issue Links

depends on

NXP-27525 Disabling healthCheck probe does not work

Resolved

is related to

NXP-28481 streamStatus probe should detect all abnormal computation termination

Resolved

NXP-28043 Backport Stream Processor probe to the runningstatus

Resolved

NXDOC-1936 Add a Nuxeo Stream section about error handling

Resolved

NXP-27529 Provide a recovery procedure for systematic failure in a stream processor

Resolved

NXP-28094 Add Nuxeo Stream probe to health check by default

Resolved

NXP-27164 Add a Stream Processor probe to the runningstatus

Resolved

(2 is related to)

Activity

People

Assignee:

Benoit Delbosc

Reporter:

Benoit Delbosc

Participants:

Benoit Delbosc, Jenkins

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

2019-06-03 14:48

Updated:

2020-12-17 16:35

Resolved:

2019-06-11 08:48

Time Tracking

Estimated:

Remaining:

Logged: