Skip to content

Logs

Kupe Cloud collects workload logs from every managed cluster and stores them in a tenant-scoped Loki backend. Your Grafana org includes the logs datasource and a built-in Logs dashboard, so there is nothing extra to install before you can start investigating.

The platform ships container logs from tenant workloads automatically and labels them with the fields you need for investigation:

  • cluster
  • namespace
  • pod
  • container

The kube-system namespace is reserved for platform-managed components and is excluded from tenant log shipping.

The Logs dashboard in the Workloads folder is the fastest way to investigate an issue without writing LogQL from scratch.

Use it when you want to:

  • narrow quickly by cluster, namespace, pod, or container
  • spot spikes in overall log volume
  • compare total log volume with likely error volume
  • read recent lines for the affected workload

Use Explore when you need deeper filtering, parsing, or ad-hoc queries.

  1. Open Grafana.
  2. Go to Explore.
  3. Select the tenant logs datasource if it is not already selected.
  4. Start with a narrow selector that includes cluster and namespace.
  5. Add line filters or parsers only after you have the right log stream.
LabelExampleMeaning
clusterproductionManaged cluster name
namespacebackendKubernetes namespace
podapi-7d9b5f-abcdePod name
containerapiContainer name within the pod
# All logs from a namespace
{cluster="production", namespace="backend"}
# Error logs from pods with a matching prefix
{cluster="production", namespace="backend", pod=~"api-.*"} |~ "(?i)error"
# Logs from one container in a multi-container pod
{cluster="production", namespace="backend", container="api"}
# Parse JSON logs and filter on a field
{cluster="production", namespace="backend"} | json | level="error"
# Parse logfmt logs and filter on a numeric field
{cluster="production", namespace="backend"} | logfmt | http_status >= 500

You can also turn log streams into metrics:

# Top 10 pods by error rate over the last 5 minutes
topk(10,
sum by (pod) (
count_over_time(
{cluster="production", namespace="backend"} |~ "(?i)error" [5m]
)
)
)
# Per-namespace log volume in the last 5 minutes
sum by (namespace) (
count_over_time({cluster="production"}[5m])
)
  • prefer structured logs such as JSON or logfmt
  • include stable fields such as level, service, and request or trace IDs
  • avoid logging secrets, tokens, or sensitive payloads
  • keep fields machine-readable so they can be filtered in Explore
  1. Confirm you are in the correct tenant org and logs datasource.
  2. Check that the workload is not running in kube-system.
  3. Narrow by cluster and namespace before adding more filters.
  4. Expand the time range if the workload is quiet.
  5. Confirm the application is writing logs to stdout or stderr.
  • Metrics: correlate a log spike with resource or error signals
  • Notifications: route alert notifications to your team