Run incidents without leaving Slack
Tailored to your organization for smarter incident management
Confident, autonomous teams guided by automations
Learn from insights to improve your resilience
Build customer trust even during downtime
Connect everything in your organization
Streamlined incident management for complex companies
Run incidents without leaving Slack
Tailored to your organization for smarter incident management
Confident, autonomous teams guided by automations
Learn from insights to improve your resilience
Build customer trust even during downtime
Connect everything in your organization
Streamlined incident management for complex companies
Like many of our own customers, at its heart, incident.io is a software company. Because of this, it means that our work is never truly “done."
One of our primary goals is to help people coordinate their response to situations where things haven’t gone well, and make it easy to always do the right thing.
But we know that there will always be bugs to fix, features to be introduced and improvements to be made, as evidenced by our changelog. And based on the feedback we get from customers, we're already a good job at regularly addressing all of these things.
That said, something users frequently compliment is our ability to deliver this value at pace and they seem genuinely curious about how we're able to do this consistently.
Here, I'm going to dive into some of the things we have in place to push the pace: our Product Responder role.
I'll explain its purpose, importance, and how it enables us to consistently delight customers by quickly resolving issues or implementing small feature requests that make our users lives better.
A big part of why we can fix issues quickly is the ‘Product Responder,’ which is a rotative role played by engineers who dedicate their week to resolve issues that are in the way of the reliability of our product, affecting a particular customer or group of customers.
Daily activities for this role usually involves triaging and fixing bugs, coordinating with our customer success team to help prioritise issues and communicate with customers.
The role described above isn’t necessarily unique to incident.io, what is unique though, is the level of experience we’re able to provide to our customers and the pace at which we can deliver.
Every software company has a version of this problem: the more successful they are, the more bugs, technical debt, and edge cases they run into.
Having a group of engineers who are always focused on reactive work has big benefits, namely:
Instead of covering the exact mechanisms, or being prescriptive about tooling, I’d like to instead share key factors that contribute to the success of our process.
Clarity: From ticket creation templates that make it easy to do the right thing, to sitting together with the customer success team to be able to react to any potential issues quickly and to frequently review prioritization in a cross-team stand-up.
Ensuring that everyone continually knows our ticket lifecycle and what “good looks like,” allows us to consistently keep tickets moving quickly from left to right.
Centralising the noise: One person within the Product Responder team is assigned the leader role, they hold the pager to ensure that we have a in-hours, on-call individual. This allows one person to be particularly interruptible, while the rest of the responders focus on tackling the issues. This also means that out-of-hours on-call have a group of people to handover any potential incidents to during their work day.
Shared responsibility: Each team puts forward a dedicated Product Responder, and this person changes roughly every week. This means that the product’s reliability is everybody’s responsibility and thus we don’t see engineers “passing the buck” to other people, no matter the area of the product people are always willing to get involved.
Close the loop: Communicate when something is done.Whenever a issue is fixed, we let our users know. Users love hearing back when the problem they reported has been fixed, there’s huge value in “closing the loop“, and this is specially magic if it happens within a few hours. It means that action items rarely fall through the cracks, because we can always ask ourselves: “have we let the customer know?”. This also provides a boost to the engineer who has fixed the issue!
We’re spending this week @incident_io crushing a bunch of important things from the backlog (as well as shipping 100 other things - it’s 🔥 rn).
— Pete Hamilton (@peterejhamilton) April 19, 2023
Thanks to @waltfy, our @vestaboard is cheering us on. Solid progress so far 💪 pic.twitter.com/NFocy0JKjR
Acknowledging surges in demand: Occasionally, we run into particularly busy weeks and our backlogs starts to grow—acknowledging this and taking the time to clear this backlog is something we do often. “Backlog crush weeks” as they are known internally, are a common occurrence here. If needed, we can also ask for help from the wider engineering team.
Focus on the developer experience: This is more of an wider-engineering trait, but our team focuses on investing in patterns and abstractions that allow us to standardize and speed up development, which ultimately also contributes to our ability to move fast.
We are building a product that customers love. We have put of lot of emphasis in our relationship with our users and will continue to do so.
Our Product Responder process is one way in which we continue to keep a high bar for our product’s reliability, and continue to support our users even when things aren’t working as expected.
Hopefully the article above provides some insight into how this process works.
If you’d like to know more about this, feel free to reach out to us on Twitter or LinkedIn!
Enter your details to receive our monthly newsletter, filled with incident related insights to help you in your day-to-day!
How our engineering team uses Polish Parties to maintain quality at pace
In a fast-moving company, quality cannot be delegated to a few individuals—it has to be a shared responsibility. One tool that helps us maintain our quality of work is Polish Parties. Here's how we run these crucial feedback sessions.
Leo Sjöberg
How we achieved pixel-perfect polish during our Status Pages launch
When we launched Status Pages, we wanted to challenge industry norms and push our design polish to new levels. As an engineering team, here's how we worked with our design team to make this happen.
Dimitra Zuccarelli
Synchronizing mental models
When everyone has their own mental model, it can hinder our ability to respond to incidents. Catalog creates a shared operational map, enabling faster decision-making, automated workflows, and an overall streamlined response process.
Chris Evans