Metrics

Kupe Cloud stores tenant metrics in Mimir and exposes them in Grafana through a tenant-scoped metrics datasource. All of your managed clusters are queryable from the same place, and the cluster label is the main way to filter or compare them.

What is collected automatically

Kupe collects baseline Kubernetes and workload metrics for every managed cluster, including pod, container, deployment, namespace, and storage signals used by the built-in dashboards.

Application metrics are collected when your pods expose a Prometheus endpoint and opt in to scraping with annotations.

Query metrics in Grafana Explore

Open Grafana from the Kupe Cloud Portal.
Go to Explore.
Select the tenant metrics datasource if it is not already selected.
Start with a simple query such as up.
Narrow the result with labels such as cluster, namespace, pod, container, or job.
Adjust the time range before concluding that a metric is missing.

Useful PromQL queries

Top CPU-consuming pods

topk(10,
  sum by (cluster, namespace, pod) (
    rate(container_cpu_usage_seconds_total{container!="",pod!=""}[5m])
  )
)

Top memory-consuming pods

topk(10,
  sum by (cluster, namespace, pod) (
    container_memory_working_set_bytes{container!="",pod!=""}
  )
)

Restart spikes in the last 30 minutes

sum by (cluster, namespace, pod) (
  increase(kube_pod_container_status_restarts_total[30m])
)

CPU throttling percentage by pod

100 *
sum by (cluster, namespace, pod) (
  rate(container_cpu_cfs_throttled_periods_total{container!="",pod!=""}[5m])
)
/
clamp_min(
  sum by (cluster, namespace, pod) (
    rate(container_cpu_cfs_periods_total{container!="",pod!=""}[5m])
  ),
  1
)

Expose application metrics

Expose a Prometheus endpoint from your workload and annotate the pods so Kupe can scrape it.

Preferred annotations:

metadata:
  annotations:
    k8s.grafana.com/scrape: "true"
    k8s.grafana.com/metrics.path: "/metrics"
    k8s.grafana.com/metrics.portNumber: "8080"

Compatibility annotations are also supported:

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/path: "/metrics"
    prometheus.io/port: "8080"

Prefer the k8s.grafana.com/* annotations for new workloads. The prometheus.io/* form is kept for compatibility.

Label conventions in Kupe

cluster: the managed cluster name, derived by the platform
namespace, pod, container: the main workload dimensions for troubleshooting
job: usually derived from app.kubernetes.io/name, with fallback to app

Use low-cardinality labels in your own metrics. Avoid request IDs, UUIDs, timestamps, or other values that create unbounded series counts.

Excluded namespaces

The kube-system namespace is reserved for platform-managed components and is excluded from tenant observability. Workloads deployed to kube-system inside your cluster will not appear in your tenant metrics datasource.

If a metric is missing

Confirm you are in the correct tenant org and metrics datasource.
Run up or up{namespace="your-namespace"} to verify scrape targets exist.
Check that the workload is not deployed in kube-system.
Verify the scrape annotations, path, and port on the pods.
Expand the time range and retry.
Switch Explore to table view if you need to inspect labels directly.

Next steps

Build reusable dashboards: Grafana Dashboards
Turn metrics into alerts: Alerting