[NXP-31679] Worker nodes Scaling Management Endpoint - Nuxeo Issue Tracker

XML

Word

Printable

The scale endpoint provides a metric which is the number of nodes to add or remove (<0) along with the explanation on why it scale out.

The target metric value should be 0 when used as utilization target for autoscale, as described in GCP for instance:
https://cloud.google.com/compute/docs/autoscaler/scaling-cloud-monitoring-metrics#configure_utilization_target

The scale metric is also exposed as a metric.
See endpoint documentation for more information:
https://doc.nuxeo.com/rest-api/1/stream-endpoint

We have stream introspection collected that provides:

When there is a lag we could compute an ETA with the current config and propose a target to scale up to improve the ETA.

When there is no lag we could recommend to scale down some nodes.

Ex for a specific computation:

lag is 100k records
8 threads running (4 threads per nodes, 2 worker nodes)
throughput is 6 records/s on node 1, 4 records/s on node 2, at cluster level: 10 records/s
The ETA is 100k/10 = 10ks -> 2h46

Another ex, there is no lag on any computation, cpu is low on worker nodes, propose to scale down.

The endpoint could output current/target deployment.

is related to

NXP-31813 Avoid NPE on Stream scaling metric

NXP-28250 Build a Nuxeo Stream Starving Probe