[NXP-28481] streamStatus probe should detect all abnormal computation termination - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 10.10
Fix Version/s: 10.10-HF22, 11.1, 2021.0
Component/s: Streams

Release Notes Summary:
StreamStatus probe detects all abnormal computation termination.
Tags:
Backlog priority:
900
Team:
PLATFORM
Sprint:
nxplatform 11.1.27
Story Points:
3

Description

Since ~~NXP-27471~~ and ~~NXP-28094~~, the streamProbe detects a failure during processing (computation user's code) and a metric can be used as alerting.

There are still code paths where failure is not reported as such:

1. In computation code when asking for termination after an uncoverable error, calling askForTermination performs a wanted termination so the probe doesn't report any failure. To fix this an exception must be raised so the fallback policy can be applied and the probe reports the failure.
For instance, this is the case In AbstractBulkComputation if the KVStore is not readable:

2020-01-06T12:06:57,906 ERROR [myActionComputationPool-00] [org.nuxeo.ecm.core.bulk.action.computation.AbstractBulkComputation] Stopping processing, unknown command: 5d20a75d-5ae3-4cf3-8cfb-45f459f883e9, offset: bulkDatasetExport-00:+78456167596035, record: Record{watermark=206874717140025344, wmDate=2020-01-06 11:40:35.602, flags=[DEFAULT], key='5d20a75d-5ae3-4cf3-8cfb-45f459f883e9:1', data.length=160, data="....%'..Y.H5d20a75d-5ae3-4cf3-8cfb-45f459f883e9.Hd4365bec-2831-4be1-a5d4-15eb43bb68adH08097371-9c40-46e4-abaf-84352ed5a797Hb667"}.

2. In the ComputationRunner code, errors are not reported as a failure by the probe. For instance when Kafka is not reachable or is not able to commit the consumer position.

We need to make sure that abnormal termination is reported as a failure by the probe.

Note that this is different from ~~NXP-28524~~ which is focus on improving resiliency when Kafka is not reachable.

Attachments

Issue Links

is related to

NXP-27471 Expose stream processor failures as metrics

Resolved

Is referenced in

PR for master: #4165

PR for master: #4294

Activity

People

Assignee:

Benoit Delbosc

Reporter:

Benoit Delbosc

Participants:

Benoit Delbosc, Jenkins, Support Tech User

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

2020-01-07 08:05

Updated:

2020-12-17 16:32

Resolved:

2020-01-27 17:06

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: