Run incidents without leaving Slack
Tailored to your organization for smarter incident management
Confident, autonomous teams guided by automations
Learn from insights to improve your resilience
Build customer trust even during downtime
Connect everything in your organization
Streamlined incident management for complex companies
Run incidents without leaving Slack
Tailored to your organization for smarter incident management
Confident, autonomous teams guided by automations
Learn from insights to improve your resilience
Build customer trust even during downtime
Connect everything in your organization
Streamlined incident management for complex companies
We’re fortunate enough to speak to a huge number of companies about their incident management processes. In doing so, we’ve noticed an emergent trend in how modern companies are using software to support their incident management processes, and a common set of challenges faced by them too.
When it comes to functionality, there’s a small number of key activities that most businesses need to consider:
For each of these activities, there’s a number of available tools to support, but the stack we’re seeing more often than any other uses:
Whilst this is by far the most common stack we see across organizations today, it’s not hitting the mark. In this post, we’ll go a little deeper on how each of these tools fit within the broader incident management picture, outline the pain points felt by those using this combination of products, and discuss how incident.io can bring them together in way that makes them far more powerful than the sum of their parts.
Slack has fast become the most important component of organizations' incident response processes. It’s one of the few tools that exist universally across an entire organization, irrespective of department or job function. Since the best run incidents are founded on rock-solid communication, it follows that Slack is so prevalent.
When we look at how Slack is used in the overall picture of incident response, it’s most commonly integrated through the use of dedicated incident channels created as part of a manual workflow or through some automation.
Since Slack also has a rich app ecosystem, it’s common to find folks using Slash commands with individual tools. Whilst it's great to be able to use these tools without leaving Slack, their interfaces are inconsistent, hard to remember, and as a result often confined to power users. Very few non-engineers are using /pd
to trigger incidents, or /jira
to log important follow-up work.
Top alternatives: Microsoft teams, Discord
PagerDuty is the de facto standard for notification infrastructure. Once an organization has decided they need round-the-clock on-call cover, PagerDuty is the tool that allows them to set up on-call schedules and escalation policies, and plug them into their automated monitoring systems.
Few companies we speak to use PagerDuty outside of an engineering context, so it’s involvement in incidents tends to stop once a person has been ‘paged’ – i.e. getting a phone call, SMS or push notification on your mobile app.
The incident management tools offered by PagerDuty don’t appear to be commonly used, with many folks citing the overly engineering focused user experience and the fact that they’re communicating inside Slack as the main reasons.
When it comes to scheduling people to be on-call and reliably making phones make noises, PagerDuty has earned its place on the top spot.
Top alternatives: Opsgenie, VictorOps/Splunk On-call
For many organizations, sharing real-time updates on incidents with their customers is either an afterthought, or not something they think about at all. There’s a common perception that talking publicly about your incidents will lead to lack of customer trust, when in our experience, and the experience of many of our customers, it’s exactly the opposite.
For those that are thinking about public comms, Atlassian’s Statuspage is the clear winner. Head to almost any company's status page – like status.incident.io – and you’ll see they’re using this product!
In the context of incidents, organizations are using Statuspage and their associated dashboards to communicate live status updates on an incident. The result? Happy customers who don’t leave angry messages with your customer support team!
Top alternatives: BetterUptime, Status.io
Whether it’s during the incident response process, or as part of the output of an incident debrief, team members involved in the incident need a place to log any post-incident work that needs to be done.
Whilst there’s a number of new kids on the block (we use and love Linear 😉) Jira, remains the most commonly used issue tracker. Whether you’re using their cloud-based SaaS offering or running on-premises, Jira is the issue tracking tool where teams are organising their work.
Aside from tracking work, we also see Jira being used for incident tracking, running help desks, and by IT teams as part of their overall ITSM process. It turns out when you’re product is based on moving tickets through a workflow, you can use it to cater for pretty much anything! We’ll reserve commenting on how end-users talk about their love of the product though.
Top alternatives: Github Issues, Linear, Shortcut, Asana, ServiceNow
When the dust has settled and you’ve managed to get things back online, it’s common for organizations to run an incident debrief process. For many this looks like digging into contributing factors (aka root causes) and facilitating a learning process.
The common approach we see is to have a template in a Google doc that gets re-used for each new incident. This’ll include key incident information, commentary on the lifecycle of the incident, metrics and visualizations, and anything else that helps folks to understand what went on.
Top alternatives: Confluence, Notion, Slab
What’s interesting here is the absence of ergonomic connectivity between these tools. You can connect PagerDuty to Slack, and have Jira auto-escalate to PagerDuty, but there’s a missing substrate here that joins up the various software solutions. This results in one of two things happening:
incident.io is a collaboration tool that exists within Slack during your incident response process. We natively integrate with the entire stack outlined above (plus many other products!) in a way that allows the entire organization to engage with the process. Incidents, after all, aren’t confined to engineers, and with more people engaging in the process we can leverage more expertise and experience to resolve things faster and more easily.
When it’s 2am and the your service is offline, nobody wants to be scrabbling around for their login details to access an unfamiliar system, and nor do they want to use a myriad of complicated Slash commands to run each tool individually through Slack.
With incident.io you can:
Once you’ve run an incident with incident.io, you’ll realise just how enjoyable it can be to have things going wrong in your organization. Ok, enjoyable might be stretch, but they’ll certainly require less effort whilst also delivering higher quality and more consistent output!
Don’t believe us, see what our customers are saying instead!
I'm one of the co-founders and the Chief Product Officer of incident.io. I've spent my whole career working in engineering.
Enter your details to receive our monthly newsletter, filled with incident related insights to help you in your day-to-day!
5 best incident response tools of 2023
All organizations need a dedicated incident management tool. In this article, we break down some of the most popular response options in the market today to help you manage incidents seamlessly and efficiently.
Luis Gonzalez
A modern data stack for startups
Data should be a commodity, and accessible to everyone in a company. We've picked a data stack for tools that make this possible- here's what we use, how it's configured, and why we picked it.
Lawrence Jones