-
Type: Improvement
-
Status: Resolved
-
Priority: Minor
-
Resolution: Fixed
-
Affects Version/s: None
-
Component/s: Streams
-
Release Notes Summary:There is a new Management endpoint for autoscaler
-
Tags:
-
Team:PLATFORM
-
Sprint:nxplatform #82, nxplatform #83
-
Story Points:5
The scale endpoint provides a metric which is the number of nodes to add or remove (<0) along with the explanation on why it scale out.
The target metric value should be 0 when used as utilization target for autoscale, as described in GCP for instance:
https://cloud.google.com/compute/docs/autoscaler/scaling-cloud-monitoring-metrics#configure_utilization_target
The scale metric is also exposed as a metric.
See endpoint documentation for more information:
https://doc.nuxeo.com/rest-api/1/stream-endpoint
We have stream introspection collected that provides:
- computation lag
- computation threads (cluster level)
- computation throughput per node, can be aggregated at cluster
When there is a lag we could compute an ETA with the current config and propose a target to scale up to improve the ETA.
When there is no lag we could recommend to scale down some nodes.
Ex for a specific computation:
- lag is 100k records
- 8 threads running (4 threads per nodes, 2 worker nodes)
- throughput is 6 records/s on node 1, 4 records/s on node 2, at cluster level: 10 records/s
The ETA is 100k/10 = 10ks -> 2h46
- the computation is reading from a stream with 16 partitions
- we could add 8 threads by adding 2 nodes
- the throughput should be 20 records/s -> ETA 1h23
Another ex, there is no lag on any computation, cpu is low on worker nodes, propose to scale down.
The endpoint could output current/target deployment.