Mufflermen observability cockpit
A composed internal-facing cockpit wired from the 14 observability primitives — metric tiles up top in a dashboard grid, query bar, service map, log stream + active alert rules in the middle and a deep-dive section with the live trace, SLO cards, error budget burndown, correlation heatmap, anomaly strip, synthetic tests and the running incident timeline.
Mufflermen observability cockpit is a reusable Oak Flats Muffler Men UI primitive with documented states, accessibility expectations, theme behavior, and implementation evidence.
Mufflermen observability cockpit: A composed internal-facing cockpit wired from the 14 observability primitives — metric tiles up top in a dashboard grid, query bar, service map, log stream + active alert rules in the middle and a deep-dive section with the live trace, SLO cards, error budget burndown, correlation heatmap, anomaly strip, synthetic tests and the running incident timeline.
Quotes API
RPS · 1mQuotes API
p95 latency · 5mParts catalogue
p95 latency · 5mQuote PDF
render p95 · 5mWorkshop scheduler
p95 latency · 5mCustomer SMS
delivery error rate · 5mhttp.request.duration.p95{service="quotes-api", env="prod", region="au-east-1"} by (region)| Time | Sev | Service | Message |
|---|---|---|---|
| 19:42:14.203 | customer-sms | ||
| 19:42:14.118 | quotes-api | ||
| 19:42:13.982 | quotes-api | ||
| 19:42:13.711 | workshop-scheduler | ||
| 19:42:12.504 | quote-pdf | ||
| 19:42:11.901 | payment-gateway | ||
| 19:42:10.882 | quote-pdf | ||
| 19:42:10.140 | parts-catalogue |
quotes-api p95 latency
quotes-api- Brief spike on parts-catalogue p99 cascadet-23
- PDF queue backlog caused upstream latencyt-7
- Drift outside forecast band — investigatingt-5
GET /api/quotes/health
POST /api/quotes
Browser: book service flow
POST /api/quote-pdf/render
GET /api/parts/search
ICMP edge probe
INC-2026-05-28-quote-pdf-oom
6 events- 19:42 AESTDetect
PDF render error rate breaches SLO (4.4%)
Threshold > 2% for 5m of 5m. Anomaly detection flagged a drift event 6 minutes prior.
by alertmanager - 19:43 AESTPage
by PagerDutyOn-call paged for quote-pdf service
- 19:46 AESTAcknowledge
by Sasha BSasha B acknowledged the page
- 19:51 AESTComms
by Sasha BInternal status update posted to #incidents
- 19:58 AESTMitigate
by ops automationRenderer pool resized 3 → 6; OOM threshold lifted
- 20:08 AESTResolve
by alertmanagerError rate back under 0.5%, SLO restored