Skip to content

Overview

Kupe Cloud includes a managed observability stack for metrics, logs, dashboards, and alerting. Every tenant gets a Grafana org with the core datasources and baseline dashboards already in place, so teams can start investigating workloads on day one.

This section explains what the platform collects automatically, where that data lives, and how to extend the default setup with your own metrics, dashboards, rules, and notification routes.

  • tenant-scoped metrics in Mimir
  • tenant-scoped workload logs in Loki
  • a Grafana org with metrics, logs, and Alertmanager datasources
  • baseline dashboards for clusters, namespaces, workloads, and storage
  • managed alerting and notification routing infrastructure
  1. Kupe collects baseline telemetry from every managed cluster.
  2. Metrics and logs are stored in shared backends but isolated to your tenant.
  3. Grafana exposes that data through dashboards and Explore.
  4. Alert rules are evaluated centrally and notifications are routed through Alertmanager.
  5. Your team adds app-specific metrics, logs, dashboards, and alert rules as needed.
  • exposing application metrics endpoints
  • emitting useful structured logs
  • creating custom dashboards when the defaults are not enough
  • defining PrometheusRule resources for app-specific alerting
  • configuring receivers and routing rules for your team
  1. Start with Grafana Dashboards to understand cluster and workload health.
  2. Use Metrics when you need to query resource usage, latency, or error signals directly.
  3. Use Logs when you need workload-level detail for a specific incident or rollout.
  4. Add Alerting and Notifications to turn those signals into an operating workflow.
  • Metrics: query and troubleshoot metrics in Grafana
  • Grafana Dashboards: use the built-in dashboards or deliver your own
  • Logs: search workload logs with the tenant Loki datasource
  • Alerting: work with managed rules and custom PrometheusRule resources
  • Notifications: send alerts to Slack, PagerDuty, Teams, email, or webhooks