Overview

Use Kupe’s managed observability stack to inspect workloads without installing extra tooling.

This section explains what the platform collects automatically, where that data lives, and how to extend the default setup with your own metrics, dashboards, rules, and notification routes.

What you get automatically

tenant-scoped metrics in Mimir
tenant-scoped workload logs in Loki
a Grafana org with metrics, logs, and Alertmanager datasources
baseline dashboards for clusters, namespaces, workloads, and storage
managed alerting and notification routing infrastructure

Operating model

Kupe collects baseline telemetry from every managed cluster.
Metrics and logs are stored in shared backends but isolated to your tenant.
Grafana exposes that data through dashboards and Explore.
Alert rules are evaluated centrally and notifications are routed through Alertmanager.
Your team adds app-specific metrics, logs, dashboards, and alert rules as needed.

What your team is responsible for

exposing application metrics endpoints
emitting useful structured logs
creating custom dashboards when the defaults are not enough
defining PrometheusRule resources for app-specific alerting
configuring receivers and routing rules for your team

A good starting path

Start with Grafana Dashboards to understand cluster and workload health.
Use Metrics when you need to query resource usage, latency, or error signals directly.
Use Logs when you need workload-level detail for a specific incident or rollout.
Add Alerting and Notifications to turn those signals into an operating workflow.

Pages in this section

Metrics: query and troubleshoot metrics in Grafana
Grafana Dashboards: use the built-in dashboards or deliver your own
Logs: search workload logs with the tenant Loki datasource
Alerting: work with managed rules and custom PrometheusRule resources
Notifications: send alerts to Slack, PagerDuty, Teams, email, or webhooks