Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-31679

Worker nodes Scaling Management Endpoint

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2023.0, 2021.35
    • Component/s: Streams
    • Release Notes Summary:
      There is a new Management endpoint for autoscaler
    • Tags:
    • Team:
      PLATFORM
    • Sprint:
      nxplatform #82, nxplatform #83
    • Story Points:
      5

      Description

      The scale endpoint provides a metric which is the number of nodes to add or remove (<0) along with the explanation on why it scale out.

      The target metric value should be 0 when used as utilization target for autoscale, as described in GCP for instance:
      https://cloud.google.com/compute/docs/autoscaler/scaling-cloud-monitoring-metrics#configure_utilization_target

      The scale metric is also exposed as a metric.
      See endpoint documentation for more information:
      https://doc.nuxeo.com/rest-api/1/stream-endpoint


      We have stream introspection collected that provides:

      • computation lag
      • computation threads (cluster level)
      • computation throughput per node, can be aggregated at cluster

      When there is a lag we could compute an ETA with the current config and propose a target to scale up to improve the ETA.

      When there is no lag we could recommend to scale down some nodes.

      Ex for a specific computation:

      • lag is 100k records
      • 8 threads running (4 threads per nodes, 2 worker nodes)
      • throughput is 6 records/s on node 1, 4 records/s on node 2, at cluster level: 10 records/s
        The ETA is 100k/10 = 10ks -> 2h46
      • the computation is reading from a stream with 16 partitions
      • we could add 8 threads by adding 2 nodes
      • the throughput should be 20 records/s -> ETA 1h23

      Another ex, there is no lag on any computation, cpu is low on worker nodes, propose to scale down.

      The endpoint could output current/target deployment.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: