Metrics
Kupe Cloud stores tenant metrics in Mimir and exposes them in Grafana through a tenant-scoped metrics datasource. All of your managed clusters are queryable from the same place, and the cluster label is the main way to filter or compare them.
What is collected automatically
Section titled “What is collected automatically”Kupe collects baseline Kubernetes and workload metrics for every managed cluster, including pod, container, deployment, namespace, and storage signals used by the built-in dashboards.
Application metrics are collected when your pods expose a Prometheus endpoint and opt in to scraping with annotations.
Query metrics in Grafana Explore
Section titled “Query metrics in Grafana Explore”- Open Grafana from the Kupe Cloud Portal.
- Go to Explore.
- Select the tenant metrics datasource if it is not already selected.
- Start with a simple query such as
up. - Narrow the result with labels such as
cluster,namespace,pod,container, orjob. - Adjust the time range before concluding that a metric is missing.
Useful PromQL queries
Section titled “Useful PromQL queries”Top CPU-consuming pods
Section titled “Top CPU-consuming pods”topk(10, sum by (cluster, namespace, pod) ( rate(container_cpu_usage_seconds_total{container!="",pod!=""}[5m]) ))Top memory-consuming pods
Section titled “Top memory-consuming pods”topk(10, sum by (cluster, namespace, pod) ( container_memory_working_set_bytes{container!="",pod!=""} ))Restart spikes in the last 30 minutes
Section titled “Restart spikes in the last 30 minutes”sum by (cluster, namespace, pod) ( increase(kube_pod_container_status_restarts_total[30m]))CPU throttling percentage by pod
Section titled “CPU throttling percentage by pod”100 *sum by (cluster, namespace, pod) ( rate(container_cpu_cfs_throttled_periods_total{container!="",pod!=""}[5m]))/clamp_min( sum by (cluster, namespace, pod) ( rate(container_cpu_cfs_periods_total{container!="",pod!=""}[5m]) ), 1)Expose application metrics
Section titled “Expose application metrics”Expose a Prometheus endpoint from your workload and annotate the pods so Kupe can scrape it.
Preferred annotations:
metadata: annotations: k8s.grafana.com/scrape: "true" k8s.grafana.com/metrics.path: "/metrics" k8s.grafana.com/metrics.portNumber: "8080"Compatibility annotations are also supported:
metadata: annotations: prometheus.io/scrape: "true" prometheus.io/path: "/metrics" prometheus.io/port: "8080"Prefer the k8s.grafana.com/* annotations for new workloads. The prometheus.io/* form is kept for compatibility.
Label conventions in Kupe
Section titled “Label conventions in Kupe”cluster: the managed cluster name, derived by the platformnamespace,pod,container: the main workload dimensions for troubleshootingjob: usually derived fromapp.kubernetes.io/name, with fallback toapp
Use low-cardinality labels in your own metrics. Avoid request IDs, UUIDs, timestamps, or other values that create unbounded series counts.
Excluded namespaces
Section titled “Excluded namespaces”The kube-system namespace is reserved for platform-managed components and is excluded from tenant observability. Workloads deployed to kube-system inside your cluster will not appear in your tenant metrics datasource.
If a metric is missing
Section titled “If a metric is missing”- Confirm you are in the correct tenant org and metrics datasource.
- Run
uporup{namespace="your-namespace"}to verify scrape targets exist. - Check that the workload is not deployed in
kube-system. - Verify the scrape annotations, path, and port on the pods.
- Expand the time range and retry.
- Switch Explore to table view if you need to inspect labels directly.
Next steps
Section titled “Next steps”- Build reusable dashboards: Grafana Dashboards
- Turn metrics into alerts: Alerting