Metrics
Kupe Cloud stores metrics in Mimir and exposes them in Grafana inside your tenant org.
All clusters in your tenant are queryable from the same Grafana metrics datasource. Use the cluster label to filter or compare clusters.
Querying metrics in Grafana Explore
Section titled “Querying metrics in Grafana Explore”- Open Grafana from the Kupe Cloud Portal.
- Go to Explore.
- Under Metrics, enter the metric you would like to query. lets start with:
up - Narrow by labels such as
cluster,namespace,pod,container, orjob. - Adjust time range and resolution before concluding a metric is missing.
PromQL queries useful in Kupe Cloud
Section titled “PromQL queries useful in Kupe Cloud”To test any query below in Grafana Explore:
- Open Explore.
- In the query editor, switch from Builder to Code mode.
- Paste the query.
- Click Run query (the refresh button) to execute it.
- Adjust time range and step if the chart looks sparse or noisy.
Top CPU-consuming pods (5m rate):
Section titled “Top CPU-consuming pods (5m rate):”topk(10, sum by (cluster, namespace, pod) ( rate(container_cpu_usage_seconds_total{container!="",pod!=""}[5m]) ))Top memory-consuming pods:
Section titled “Top memory-consuming pods:”topk(10, sum by (cluster, namespace, pod) ( container_memory_working_set_bytes{container!="",pod!=""} ))Restart spikes in the last 30 minutes:
Section titled “Restart spikes in the last 30 minutes:”sum by (cluster, namespace, pod) ( increase(kube_pod_container_status_restarts_total[30m]))CPU throttling percentage by pod:
Section titled “CPU throttling percentage by pod:”100 *sum by (cluster, namespace, pod) ( rate(container_cpu_cfs_throttled_periods_total{container!="",pod!=""}[5m]))/clamp_min( sum by (cluster, namespace, pod) ( rate(container_cpu_cfs_periods_total{container!="",pod!=""}[5m]) ), 1)Exposing application metrics
Section titled “Exposing application metrics”To get app metrics into Kupe Cloud, expose a Prometheus endpoint and annotate pods.
Preferred annotations:
metadata: annotations: k8s.grafana.com/scrape: "true" k8s.grafana.com/metrics.path: "/metrics" k8s.grafana.com/metrics.portNumber: "8080"Compatibility annotations are also supported:
metadata: annotations: prometheus.io/scrape: "true" prometheus.io/path: "/metrics" prometheus.io/port: "8080"Both annotation families are supported. Use one family consistently.
If both are set, k8s.grafana.com/scrape controls scrape enablement, while path/port behavior can be ambiguous.
Label conventions in Kupe
Section titled “Label conventions in Kupe”cluster: derived by Kupe from the managed cluster context and used for multi-cluster filtering.namespace,pod,container: primary workload dimensions for troubleshooting.job: often mapped fromapp.kubernetes.io/namewhen available.
Use low-cardinality labels in application metrics. Avoid per-request IDs, UUIDs, or timestamps as labels.
If a metric is missing
Section titled “If a metric is missing”- Confirm you’re in the correct tenant org and datasource.
- Run
upandup{namespace="your-namespace"}to verify scrape targets exist. - Check pod annotations for scrape enablement and correct path/port.
- Expand the Grafana time range (for example, last 6h) and retry.
- Validate metric names and labels in table view before building final queries.
Next steps
Section titled “Next steps”- Build reusable dashboards: Grafana Dashboards
- Turn queries into alerts: Alerting