Customer SMS delivery delays in APAC-1
5 Whys
- 1Customers in APAC-1 saw 5-minute booking confirmation delays.
- 2Because the carrier's primary route returned 5xx at 14:32 UTC.
- 3Because the carrier failed over without honouring our route-pinning header.
- 4Because the route-pin extension is opt-in and our APAC account wasn't enrolled.
- 5Because the enrollment task slipped during the 2025 carrier-consolidation epic.
Action items
- Add carrier-side failover health-check to alerting before customer-visible threshold.Jess R · Booking · due 2026-06-10Open
- Stand up secondary SMS partner with auto-route at >2% delivery failure for 5 minutes.Marcus P · Parts · due 2026-06-21Open
- Document APAC failover runbook and rehearse in next game-day.Sasha B · SRE · due 2026-07-04Open
Lessons learned
We over-trusted the carrier's primary route and treated a multi-region SMS partner as a single dependency. From now on every customer-impacting carrier dependency must have an automated failover test in the on-call game-day.