How we page ourselves if incident.io goes down
Learn how we tackle the ultimate paradox: ensuring our alerting system pages us, even when it’s the one failing. It's a common question - let's dive into detail on our "dead man's switch", how we stress-test our systems, and why we care so much about our setup allowing us to dogfood our own product.
Lawrence Jones
Organizing ownership: How we assign errors in our monolith
At incident.io, we streamline our monolith by assigning clear ownership to chunks of code and enforcing it with CI checks. Tagged errors are automatically routed to the right team, reducing on-call stress and keeping our system efficient as we scale. Here's how we do it.
Martha Lambert
Lessons from 4 years of weekly changelogs
Writing a meaningful update for customers every week has been held sacred at incident.io since we started the company. We've written over 200 of them in the past 4 years, and we recently celebrated going 2 years straight without missing a single a single week 🚀. Learn how we do it!
Pete Hamilton
Observability as a superpower
At incident.io, tracing is our secret weapon for catching bugs before customers do. This blog unpacks how traces and spans are built, showcasing their role in debugging and performance tuning. From span creation to integrating traces with logs and error reports, it's a practical guide for adding tracing to your observability toolkit—whether you're in development or production.
Sam Starling
Choosing the right Postgres indexes
Indexes can dramatically boost your database performance, but knowing when to use them isn’t always obvious. This blog covers what indexes are, when to use them, how to choose the right type, and tips for spotting missing ones. Whether you're optimizing queries, enforcing uniqueness, or improving sorting, you'll learn how to fine-tune your indexing strategy without overcomplicating it.
Milly Leadley
Building On-call: Our observability strategy
Our customers count on us to sound the alarm when their systems go sideways—so keeping our on-call service up and running isn’t just important; it’s non-negotiable. To nail the reliability our customers need, we lean on some serious observability (or as the cool kids say, o11y) to keep things running smoothly.
Martha Lambert
Building On-call: Continually testing with smoke tests
Launching On-call meant we had to make our system rock-solid from the get-go. Our solution? Smoke tests to let us continually test product health and make sure we're comfortable making changes at pace.
Rory Malcolm
Scoping week
A few months back, we launched On-call with a solid set of features—but that was just the start. To keep the wheels turning, we recently held a "scoping week" where we paired up, tackled ambiguities, and nailed down our project roadmap. Here's how we did it.
Leo Sjöberg
Building On-call: Time, timezones, and scheduling
Time is tricky, but building our On-call scheduler meant getting cozy with all of its quirks— and lots of testing. No "time" like the present to dive in!
Henry Course
Stay in the loop: subscribe to our RSS feed.