Overview
Kupe Cloud includes a managed observability stack for metrics, logs, dashboards, and alerting. Every tenant gets a Grafana org with the core datasources and baseline dashboards already in place, so teams can start investigating workloads on day one.
This section explains what the platform collects automatically, where that data lives, and how to extend the default setup with your own metrics, dashboards, rules, and notification routes.
What you get automatically
Section titled “What you get automatically”- tenant-scoped metrics in Mimir
- tenant-scoped workload logs in Loki
- a Grafana org with metrics, logs, and Alertmanager datasources
- baseline dashboards for clusters, namespaces, workloads, and storage
- managed alerting and notification routing infrastructure
Operating model
Section titled “Operating model”- Kupe collects baseline telemetry from every managed cluster.
- Metrics and logs are stored in shared backends but isolated to your tenant.
- Grafana exposes that data through dashboards and Explore.
- Alert rules are evaluated centrally and notifications are routed through Alertmanager.
- Your team adds app-specific metrics, logs, dashboards, and alert rules as needed.
What your team is responsible for
Section titled “What your team is responsible for”- exposing application metrics endpoints
- emitting useful structured logs
- creating custom dashboards when the defaults are not enough
- defining PrometheusRule resources for app-specific alerting
- configuring receivers and routing rules for your team
A good starting path
Section titled “A good starting path”- Start with Grafana Dashboards to understand cluster and workload health.
- Use Metrics when you need to query resource usage, latency, or error signals directly.
- Use Logs when you need workload-level detail for a specific incident or rollout.
- Add Alerting and Notifications to turn those signals into an operating workflow.
Pages in this section
Section titled “Pages in this section”- Metrics: query and troubleshoot metrics in Grafana
- Grafana Dashboards: use the built-in dashboards or deliver your own
- Logs: search workload logs with the tenant Loki datasource
- Alerting: work with managed rules and custom PrometheusRule resources
- Notifications: send alerts to Slack, PagerDuty, Teams, email, or webhooks