Run incidents without leaving Slack
Tailored to your organization for smarter incident management
Confident, autonomous teams guided by automations
Learn from insights to improve your resilience
Build customer trust even during downtime
Connect everything in your organization
Streamlined incident management for complex companies
Run incidents without leaving Slack
Tailored to your organization for smarter incident management
Confident, autonomous teams guided by automations
Learn from insights to improve your resilience
Build customer trust even during downtime
Connect everything in your organization
Streamlined incident management for complex companies
Interrupts are often seen as a problem that eats away at your team’s productivity, and gets in the way of shipping important things for your customers. It’s often consciously accrued from the tech debt we accept to ship features sooner. However when a team doesn’t have a good strategy for dealing with the consequences of those decisions, the pain is felt much more acutely and much sooner.
Teams will often operate on a 'tick tock' approach to dealing with this, where they halt feature work and burn down tech debt for an entire planning cycle. The decision to spend a cycle on tech debt is usually spurred by the interrupts draining too much of people’s time, or a nasty incident that forces you to take stock and pay off some of the accrued debt, at the cost of not shipping features for a while. They might also depend more heavily on customer support to triage and solve issues, extending feedback loops and putting critical fixes into long product roadmaps, not seeing resolution for months.
At incident.io we’ve tried to reframe interrupts to our advantage. We believe that by setting up our engineering team to explicitly cater for this work, we can deliver much lower latency tactical changes that delight customers, and makes the entire team more productive. We can often ship fixes for bugs or deliver simple features to our customers within an hour or two of them mentioning them.
By having a dedicated interruptible member of the team, we can tackle things that otherwise might fall into a backlog and never be prioritised. Many customer & engineering issues can be addressed with smaller tactical fixes, and when we encounter things larger than that, we’ve already had someone spending time to triage and scope those issues.
We call that role “Product Responder”, and as with much of how we run the company, we wrote a proposal on what the role is. I think it does a great job of showing our thinking around this role, so we’ve open sourced the proposal. Feel free to give it a read through.
This is also a great forcing function for knowledge distribution. By putting a newer member of the team on Product Responder (with a shadow for support) they’ll naturally be pushed to explore and solve problems in domains that they otherwise might miss. Importantly, those domains are implicitly determined by what the company and our customers need, exposing them to the stuff that matters most.
To summarise, a Product Responder is:
I don’t think what we’re doing here is necessarily novel or a magic bullet, lots of engineering organizations have something in this shape, but I think there’s value in codifying what we’re doing, and being explicit about what the role is and why it’s worth investing in. We’re very lucky to be the shape of business where product teams working directly with customers is scaleable & effective, so doubling down on making that work as well as possible feels like a very good use of time.
Enter your details to receive our monthly newsletter, filled with incident related insights to help you in your day-to-day!
Better learning from incidents: A guide to incident post-mortem documents
Post-mortem documents are a great way to facilitate learning after incidents are resolved.
Luis Gonzalez
How we’ve made Status Pages better over the last three months
A few months ago we announced Status Pages -- the most delightful way to keep customers up-to-date about ongoing incidents. Since then, we've launched several features to add an extra bit of delight. Read on to learn more.
Asiya Gorelik
The balancing act of reliability and availability
To prevent issues like downtime, you have to focus on the reliability and availability of your product. But there's a balance to be struck here.
incident.io