-
Type: Improvement
-
Status: Resolved
-
Priority: Minor
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 10.10-HF58, 2021.15
-
Component/s: Monitoring
-
Release Notes Summary:Long bulk commands and slow stream processing are now traced
-
Tags:
-
Team:PLATFORM
-
Sprint:nxplatform #53
-
Story Points:3
When processing is long we should have some log to understand what is going on.
A first example is to trace any Bulk Command and its associated status when the document set is large.
This can be done on the BulkScroller as soon as we scroll more than 100k items, we could trace something like:
BBC: <id> (Big Bulk Command) detected scrolling more than 100k: <command dump> // When the scrolling is done we could have BBC: <id>: scroll completed <> items // When the command is completed BBC: <id> completed: <command status>
This way while processing is ongoing it is easy to pinpoint the command and who is running it, it is also possible to get the current status with the rest API.
This will also provide interesting statistics for long commands and how to tune them.
Another idea could be to trace specific slow processing at the record level, this could pinpoint any long record processing (including slow Work).
Here the idea is to use the timer metric around process record in the computation runner, we could dump the record for max duration greater than 2min after 100 processing.
try (Timer.Context ignored = processRecordTimer.time()) { // ... call processRecord with retry long duration = ignored.stop(); if (processRecordTimer.getCount() > 100 && duration > 2_MIN && duration >= processRecordTimer.getSnapshot().getMax()) { // ... trace record as slow processing }