Build an incident timeline from Grafana data
When to use: Pager just fired; you want a coherent timeline before joining the call.
Prerequisites
- Grafana service-account token — Grafana → Administration → Service accounts; Editor role is enough for queries
Flow
-
What's firingGrafana: list firing alerts in the last 30 minutes for service=checkout.✓ Copied→ list_alerts returns 1+ alerts with timestamps
-
Pull the metricFor each alert, run a Prometheus range query for the underlying metric over the last 1h. Note the breach time.✓ Copied→ query_prometheus_range returns time series
-
Pull logs at breachLoki: logs for service=checkout, level=error, [breach_time-2m, breach_time+2m]. Top patterns.✓ Copied→ Log lines clustered by signature
-
Compose timelineBuild a concise timeline: alert fired → metric breach → top 3 error log patterns. Markdown for Slack.✓ Copied→ Timeline ready to paste
Outcome: Coherent incident timeline assembled before you join the call.
Pitfalls
- Loki query too broad → blows token budget — Always include service= label; limit time window aggressively
- Prometheus query has step too fine — Use step=15s or 30s for 1h windows