Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-27164

Add a Stream Processor probe to the runningstatus

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 10.10
    • Fix Version/s: 11.1, 2021.0
    • Component/s: Streams
    • Release Notes Description:
      Hide

      Since 11.1 you can activate a health check probe to check the status of stream processors.
      The option to activate in nuxeo.conf is:

      nuxeo.stream.healthCheck.enabled=true
      

      If a stream processor fails after retries and its failover policy is to stop on error the runningstatus will be in error.
      When this happens the Nuxeo node needs to be restarted to continue the processing.
      Note that by default the health check probe is not activated.

      Show
      Since 11.1 you can activate a health check probe to check the status of stream processors. The option to activate in nuxeo.conf is: nuxeo.stream.healthCheck.enabled= true If a stream processor fails after retries and its failover policy is to stop on error the runningstatus will be in error. When this happens the Nuxeo node needs to be restarted to continue the processing. Note that by default the health check probe is not activated.
    • Epic Link:
    • Tags:
    • Upgrade notes:
      Hide

      Probe and ProbeStatus classes have been moved to nuxeo-runtime-management maven modules under the package org.nuxeo.runtime.management.api.

      Show
      Probe and ProbeStatus classes have been moved to nuxeo-runtime-management maven modules under the package org.nuxeo.runtime.management.api .
    • Sprint:
      nxcore 11.1.7, nxcore 11.1.8 / 11.1.9
    • Story Points:
      3

      Description

      Since 10.10 a computation can follow a retry policy. If the fallback after retries is to terminate the computation, it requires to restart the instance to get the computation back.

      If the stream is not distributed (using Chronicle Queue and not Kafka) the computation needs to be restarted before the stream retention duration (4 days by default)or we experience data loss.

      Even with Kafka a very long outage can results in terminating all computations and at least one instance need to be restarted.

      A possible solution is to add a new probe to the runningstatus, this probes will return KO once a computation has terminated on failure.

      When the runningstatus is used as health check by a load balancer,  a computation failure become a red alert which should be the case.

       

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 0 minutes
                  0m
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 minute
                  1m

                    PagerDuty

                    Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.