In order to better track usage per team.
This is not only about billing, it's also to understand which team/job uses the most resources and to avoid allocating a huge amount of pods uselessly.
For now, what seems clear is that we need to filter and aggregate the usage data stored in BigQuery. The natural keys are "project" and "namespace", however, this is not enough for what we need.
As a result, we need to add labels to each resource we allocate in Jenkins X.
It should be possible to add a few labels to each resource we create.
As a first step we can propose:
- team: platform, gang, nos, devtools …
- usage: build, preview, ftest, infra (nexus, chart museum …)
- branch: master, NXP-xxx
See Labels and Selectors.
Adding labels to all the Kubernetes resources used by the platform team is not trivial, yet we've managed to add some labels to the main resources: pods running the pipelines and used for the ARender preview.
Cover the resources used by a dedicated Jenkins X instance (aka team) is not trivial since:
- There are many different kinds of resources:
- It's hard to always find out how/when/by whom these resources are created.
As a first step, we've managed to add labels to some of the resources related to the platform team:
They run in the platform namespace.
They run in a dedicated namespace created by the nuxeo/nuxeo pipeline, "nuxeo-unit-tests-redis-master" for instance.
They run in a dedicated namespace created by the nuxeo-arender-connector pipeline with jx preview, "nuxeo-arender-pr-100" for instance.
This includes deployments and services for each microservice as well as the nuxeo Helm chart resources.
Yet, here is a set of resources to which we didn't manage or have time to add any labels, and we are probably forgetting some:
Basically, what is installed by the Jenkins X platform Helm chart: mainly the Jenkins, Nexus, Docker registry, and ChartMuseum deployments.
Workaround: query on namespace="platform" AND myLabels IN (('app', 'jenkins'), ('app', 'nexus'), ...).
Approach: we could use our own Helm chart to install the Jenkins X platform with custom labels and/or open a PR to be able to add custom labels when installing the existing chart.
Basically, any time we build a Docker image: builders, platform, ...
There doesn't seem to be a simple way of doing it.
Workaround: query on namespace="platform" AND myLabels IN (('skaffold-kaniko': 'skaffold-kaniko')).
Approach: There's an issue about adding annotations to Kaniko pods: https://github.com/GoogleContainerTools/skaffold/issues/1759. We could create a GitHub issue for labels.
This includes services, statefulsets, ...
Workaround: get the Redis namespaces with:
then for each namespace get all the resources with:
Approach: we could update our nuxeo-redis Helm chart to allow custom labels in the templates.
E.g. mongodb, postgresql, elasticsearch.
Workaround: same as Redis.
Approach: same as Redis.
The current solution is not exhaustive and seems kind of hackish as we need to hook in a lot of places to add the labels and we're duplicating some code.
In the future, it should be improved with a more global and sustainable solution:
- Use a jx wrapper to inject the labels for each command such as jx step helm install or jx preview. In fact, this is what jx itself is doing by patching the Helm chart YAML...
- Have a Kubernetes operator handling it whenever a pod, namespace or whatever resource is started from the platform namespace.
This is an example of what can be done to retrieve the resource usage for the "team: platform" label and the Kaniko pods.
Not 100% sure about the difference between the gke_cluster_resource_usage and gke_cluster_resource_consumption tables.
According to https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-usage-metering#view_in_bigquery:
- gke_cluster_resource_usage => resource requests
- gke_cluster_resource_usage_consumed => ressource consumption (except in our case the table is gke_cluster_resource_consumption).
There seems to always be a delay between the results returned from the consumption table and the usage one...
The usage.amount and usage.unit fields can be interesting.
Some useful examples below.
Get labels for a given pod:
Get labels for all resources of a given namespace:
Filter pods by label (-A to list the requested objects across all namespaces):
It allows it have interesting GKE Usage Metering reports based on the jx-preprod.GoogleBillingDetails.gcp_billing_export_v1_00E3A4_C28D15_595CC3 BigQuery table, see https://datastudio.google.com/datasources/1RgkQ95xH5j-070XBT6P3Nn1Qo-a5SAPN then EDIT CONNECTION.
Default report example, base on a namespace aggregation:
https://datastudio.google.com/reporting/1qZEwX6S4E51QlHlQ5X1G8z0y-mgL-HlK/page/bLKZ, see attached screenshot.
We could probably configure some fine-grained aggregations based on labels.