-
Type: Task
-
Status: Resolved
-
Priority: Minor
-
Resolution: Fixed
-
Affects Version/s: 10.10
-
Component/s: Streams
-
Release Notes Description:
-
Epic Link:
-
Tags:
-
Upgrade notes:
-
Sprint:nxcore 11.1.7, nxcore 11.1.8 / 11.1.9
-
Story Points:3
Since 10.10 a computation can follow a retry policy. If the fallback after retries is to terminate the computation, it requires to restart the instance to get the computation back.
If the stream is not distributed (using Chronicle Queue and not Kafka) the computation needs to be restarted before the stream retention duration (4 days by default)or we experience data loss.
Even with Kafka a very long outage can results in terminating all computations and at least one instance need to be restarted.
A possible solution is to add a new probe to the runningstatus, this probes will return KO once a computation has terminated on failure.
When the runningstatus is used as health check by a load balancer, a computation failure become a red alert which should be the case.