Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-28481

streamStatus probe should detect all abnormal computation termination

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 10.10
    • Fix Version/s: 10.10-HF22, 11.1
    • Component/s: Streams

      Description

      Since NXP-27471 and NXP-28094, the streamProbe detects a failure during processing (computation user's code) and a metric can be used as alerting.

      There are still code paths where failure is not reported as such:

      1. In computation code when asking for termination after an uncoverable error, calling askForTermination  performs a wanted termination so the probe doesn't report any failure. To fix this an exception must be raised so the fallback policy can be applied and the probe reports the failure.
      For instance, this is the case In  AbstractBulkComputation if the KVStore is not readable:

      2020-01-06T12:06:57,906 ERROR [myActionComputationPool-00] [org.nuxeo.ecm.core.bulk.action.computation.AbstractBulkComputation] Stopping processing, unknown command: 5d20a75d-5ae3-4cf3-8cfb-45f459f883e9, offset: bulkDatasetExport-00:+78456167596035, record: Record{watermark=206874717140025344, wmDate=2020-01-06 11:40:35.602, flags=[DEFAULT], key='5d20a75d-5ae3-4cf3-8cfb-45f459f883e9:1', data.length=160, data="....%'..Y.H5d20a75d-5ae3-4cf3-8cfb-45f459f883e9.Hd4365bec-2831-4be1-a5d4-15eb43bb68adH08097371-9c40-46e4-abaf-84352ed5a797Hb667"}.
      
      

       2. In the ComputationRunner code, errors are not reported as a failure by the probe. For instance when Kafka is not reachable or is not able to commit the consumer position.

      We need to make sure that abnormal termination is reported as a failure by the probe.

      Note that this is different from NXP-28524 which is focus on improving resiliency when Kafka is not reachable.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 2 days
                  2d

                    PagerDuty

                    Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.