Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-28508

Expose Nuxeo Stream latency metrics to Datadog

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 10.10
    • Fix Version/s: 10.10-HF23, 11.1
    • Component/s: Monitoring, Streams
    • Release Notes Description:
      Hide

      Nuxeo Stream metrics about consumer lags and latencies can now be exposed to Datadog using stream.sh command:

      ./bin/stream.sh datadog -k --codec avro -l ALL -i 60 --api-key <DATADOG_API_KEY> --tags "staging:foo,project:bar"

       

      The list of exposed metrics are:

      • nuxeo.streams.lag: the lag of the consumer for the stream, in records.
      • nuxeo.streams.latency: the latency of the consumer for the stream in microsecond.
      • nuxeo.streams.pos: the last checkpointed position of the consumer in the stream, in record.
      • nuxeo.streams.end: the end offset of a stream, in record.

      The additional Datadog tags are:

      • stream: the name of the stream
      • consumer: the name of the consumer group
      • partition: Either "all" for an aggregated metric or a number for a specific partition
      • host: the host name that has reported the metric (should not be useful because metrics are global to the cluster)

       

      Show
      Nuxeo Stream metrics about consumer lags and latencies can now be exposed to Datadog using stream.sh command: ./bin/stream.sh datadog -k --codec avro -l ALL -i 60 --api-key <DATADOG_API_KEY> --tags "staging:foo,project:bar"   The list of exposed metrics are: nuxeo.streams.lag: the lag of the consumer for the stream, in records. nuxeo.streams.latency: the latency of the consumer for the stream in microsecond. nuxeo.streams.pos: the last checkpointed position of the consumer in the stream, in record. nuxeo.streams.end: the end offset of a stream, in record. The additional Datadog tags are: stream: the name of the stream consumer: the name of the consumer group partition: Either "all" for an aggregated metric or a number for a specific partition host: the host name that has reported the metric (should not be useful because metrics are global to the cluster)  
    • Sprint:
      nxplatform 11.1.27, nxplatform 11.1.28
    • Story Points:
      5
    • Team:
      PLATFORM

      Description

      Since NXP-26248 stream.sh monitor can expose lag and latency metrics to Graphite.

      We want the same feature for Datadog.

      Datadog is unable to work properly with wildcard, producing metrics per stream, consumer and partition is very hard to exploit in Datadaog:
      server.<hostname>.nuxeo.stream.<stream_name>.<consumer_group>.<partition>.latency

      Instead, we need to use simple fewer metrics like:

      • nuxeo.streams.lag the lag of the consumer for the stream, in records.
      • nuxeo.streams.latency the latency of the consumer for the stream in microsecond.
      • nuxeo.streams.pos the last checkpointed position of the consumer in the stream, in record.
      • nuxeo.streams.end the end offset of a stream, in record.

      And use Datadog tags for the stream, consumer and partition dimensions:

      • stream:name the name of the stream
      • consumer:group the consumer group name
      • partition:partition the partition number like 00, 10 or {{all for aggregated metric for the stream

      Also by default we don't need to have metrics per partition, the all aggregation should be good enough and will reduce the number of metrics to 4 per stream consumer.

       

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 3 days
                  3d

                    PagerDuty

                    Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.