[NXP-27164] Add a Stream Processor probe to the runningstatus - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 10.10
Fix Version/s: 11.1, 2021.0
Component/s: Streams

Release Notes Description:
Hide

Since 11.1 you can activate a health check probe to check the status of stream processors.
The option to activate in nuxeo.conf is:

nuxeo.stream.healthCheck.enabled=true

If a stream processor fails after retries and its failover policy is to stop on error the runningstatus will be in error.
When this happens the Nuxeo node needs to be restarted to continue the processing.
Note that by default the health check probe is not activated.
Show
Since 11.1 you can activate a health check probe to check the status of stream processors. The option to activate in nuxeo.conf is: nuxeo.stream.healthCheck.enabled= true If a stream processor fails after retries and its failover policy is to stop on error the runningstatus will be in error. When this happens the Nuxeo node needs to be restarted to continue the processing. Note that by default the health check probe is not activated.
Epic Link:
Resiliency
Tags:
- nxcore
Upgrade notes:

Hide

Probe and ProbeStatus classes have been moved to nuxeo-runtime-management maven modules under the package org.nuxeo.runtime.management.api.

Show
Probe and ProbeStatus classes have been moved to nuxeo-runtime-management maven modules under the package org.nuxeo.runtime.management.api .
Sprint:
nxcore 11.1.7, nxcore 11.1.8 / 11.1.9
Story Points:
3

Description

Since 10.10 a computation can follow a retry policy. If the fallback after retries is to terminate the computation, it requires to restart the instance to get the computation back.

If the stream is not distributed (using Chronicle Queue and not Kafka) the computation needs to be restarted before the stream retention duration (4 days by default)or we experience data loss.

Even with Kafka a very long outage can results in terminating all computations and at least one instance need to be restarted.

A possible solution is to add a new probe to the runningstatus, this probes will return KO once a computation has terminated on failure.

When the runningstatus is used as health check by a load balancer, a computation failure become a red alert which should be the case.

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

StreamStatus Probe output in admin center.png
28 kB
2019-06-06 13:11

Issue Links

is related to

NXP-28043 Backport Stream Processor probe to the runningstatus

Resolved

NXP-27471 Expose stream processor failures as metrics

Resolved

Activity

People

Assignee:

Pierre Gautier

Reporter:

Benoit Delbosc

Participants:

Benoit Delbosc, Jenkins, Pierre Gautier

Votes:

0 Vote for this issue

Watchers:

4 Start watching this issue

Dates

Created:

2019-04-04 16:13

Updated:

2020-12-17 16:35

Resolved:

2019-05-23 08:44

Time Tracking

Estimated:

Remaining:

Logged: