-
Type: Improvement
-
Status: Resolved
-
Priority: Minor
-
Resolution: Fixed
-
Affects Version/s: None
-
Component/s: Monitoring
-
Epic Link:
-
Upgrade notes:
-
Team:PLATFORM
-
Sprint:nxplatform 11.1.31, nxplatform 11.1.32
-
Story Points:0
In order to reduce the number of metrics, only useful metrics should be published to Graphite and Datadog.
Today a timer in our code (dropwizzard metrics) reports 16 metrics:
- min
- max
- mean
- stddev: standard deviation
- p50: percentile 50%
- p75
- p95
- p98
- p99
- p999
- m1_rate throughput moving average on one minute
- m5_rate 5 minutes
- m15_rate 15 minutes
- mean_rate
- count number of timer
- sum total cumulative time
We have timers on cache, directory, repository api, elasticsearch api, work manager, stream computation.
So reducing metrics per timer is very effective.
We could disable by default the following metrics and reduce to 9 metrics per timer (43% reduction)
- p95
- p99
- p999
- m5_rate
- m15_rate
- mean_rate
- sum
Other metrics should also be disabled by default like nuxeo.ActionService timers that can be profiled in a dev environment and should not be a major performance problem or related to the production environment.