-
Type: Epic
-
Status: Open
-
Priority: Minor
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: Streams
-
Tags:
-
Team(s):PLATFORM
-
Completion Level (0 to 5):5
The goal is to have an overview of the stream processing at the cluster level in order to:
- quickly understand where there is a bottleneck or problem without having to ssh
- take a decision on how to scale-out, scale-down, tune the existing configuration
- build a dedicated scaling metric that can be used by a horizontal auto scaler (HPA).
Because Nuxeo Stream is used at a low level it will cover all async processing: Async listeners, WorkManager, Bulk Service, and of course Nuxeo Stream (when using Kafka).
We want a representation at the cluster level that includes:
- all streams used with their number of partitions
- all Nuxeo nodes that participate in the async processing, with the number of threads for each computation
- the lag and latency for each consumer group
- computations failures
- eventually for each node: CPU usage, JVM memory pressure
The idea is to report all processor topologies on node start NXP-29934) and create a specific stream metrics reporter (NXP-29933) that informs about activities. A computation will aggregate both streams and build a representation that will be exposed as REST (NXP-29935).
- is related to
-
NXP-29945 Provide a tool to analyze the content of the default WorkManager queue
- Resolved