Creating an on-call product is hard: it has to be rock-solid, capable of handling massive alert storms, and be designed to minimize the impact of on-call on the lives of those responding. In this series, we share behind-the-scenes details of how we built our on-call product. From collaborating closely with our design partners to running rigorous load testing and reliability drills, we’ll share the journey of developing a product that reimagines the on-call experience.
Behind the scenes: Launching On-call
We like to ship it then shout about it, all the time. Building On-call was different.
Henry Course
Building On-call: Our observability strategy
Our customers count on us to sound the alarm when their systems go sideways—so keeping our on-call service up and running isn’t just important; it’s non-negotiable. To nail the reliability our customers need, we lean on some serious observability (or as the cool kids say, o11y) to keep things running smoothly.
Martha Lambert
Building On-call: The complexity of phone networks
Making a phone call is easy...right? It's time to re-examine the things you thought were true about phone calls and SMS.
Leo Sjöberg
Building On-call: Building a multi-platform on-call mobile app
What does it take to build a greenfield mobile app in 2024? When we launched On-call earlier this year, we had to find out.
Rory Bain
Building On-call: Time, timezones, and scheduling
Time is tricky, but building our On-call scheduler meant getting cozy with all of its quirks— and lots of testing. No "time" like the present to dive in!
Henry Course
Building On-call: Continually testing with smoke tests
Launching On-call meant we had to make our system rock-solid from the get-go. Our solution? Smoke tests to let us continually test product health and make sure we're comfortable making changes at pace.
Rory Malcolm
How we page ourselves if incident.io goes down
Learn how we tackle the ultimate paradox: ensuring our alerting system pages us, even when it’s the one failing. It's a common question - let's dive into detail on our "dead man's switch", how we stress-test our systems, and why we care so much about our setup allowing us to dogfood our own product.
Lawrence Jones