Incidents are for everyone

Incident management tools cover engineers and SREs. What about everyone else?

If you’re an engineer or SRE, chances are you’re using a dedicated incident management platform to respond to incidents—and thankfully so.

These tools allow teams to have a structure around their response processes and meaningfully improve them. Because of this, the impact of these tools can be felt almost immediately upon rollout.

Faster resolution times. Less downtime. Improved customer communications. All of which ultimately lead to happier customers and better retention, which trickles right into revenue.

In a world without incident management tools, teams would be much worse off. Not just in terms of their daily workflows but the organization as a whole.

Think about it.

When an incident is triggered, what would you do? Sure, you may have Slack or Teams as a place to communicate, but then what? Who gets pulled in? Who’s the lead? What are the action items? What about follow-ups? Who’s in charge of external comms? Who gets notified? The list goes on and on. So it feels reasonable to say that, with an incident management solution in place, life is much better for engineers and SREs. But what about everyone else?

You may be thinking, “What do you mean what about everyone else? They don’t deal with incidents.” They sure do—they just aren’t calling them that. As a result, they’re missing out on many of the benefits that come with using an incident management solution.

What are incidents, really?

Let’s first define what an incident is because its core to this argument. In our Incident management guide, we define incidents as:

…anything that takes you away from planned work with a degree of urgency.

Notice that we intentionally left this pretty open. Partly because every organization is different so the makeup of an incident will vary quite a bit from team to team. But it’s also because incidents can truly be anything.

First, we have to challenge the assumption that incident management tools are only meant to help manage incidents. We also have to challenge the idea that incidents can only be technical issues that engineers and SREs deal with on a daily basis.

Let’s use a few hypothetical situations.

Imagine you have a former employee who’s breached their contract and has threatened to disclose confidential information that will jeopardize the future of your business. Is this a big deal? Absolutely. Potential information breaches keep people up at night. Think about how you might react in this situation. You would need to respond to the breach with the level of severity that it deserves—this is a critical situation that requires an all-hand-on-deck approach. You would then coordinate and rally everyone who needs to be involved, each with a specific role to play.

Once the fire is out, you might even come together to figure out how to avoid situations like these in the future. Data breaches are a PR nightmare, so you’d want to make sure you’re putting as many resources as possible towards avoiding them entirely. Let’s call these…follow-up actions.

Now, take a typical technical incident declaration and response process. An engineer finds an issue causing a website outage. In this case, they would declare an incident using their tool of choice, and teams would then coordinate to resolve the incident and communicate throughout the process. They would then come together for a debrief to figure out how they can avoid incidents like these in the future.

...the first scenario is starting to sound a lot like an incident, isn’t it?

If you can answer “Yes” to at least two of the following questions, you likely have an incident at hand:

Does the problem you're facing have a risk of some negative impact on the product, business or customers?

Is a checkout page down on Black Friday? Is someone claiming something that isn’t true on social media and it’s generating attention? Both of these can and should be considered incidents.
Do you need to respond to the problem urgently, outside of your planned work?

Not everything needs to be responded to right away. But if it does, there’s a good chance it’s an incident.
Do you need to coordinate between multiple people or departments?

If you find yourself needing to coordinate and communicate with several folks across your org, you probably have an incident at hand.
Do you need to communicate to the rest of the organization, or to your customers?

If you find yourself thinking “We need to send out comms,” you’re probably in the middle of an incident.
Is this something you want to discuss afterwards and review to extract learnings?

Incidents, while no one wants to deal with them, are great learning opportunities. If the issue at hand presents an opportunity to learn and build resilience, it’s likely an incident.

More incidents, more problems?

Through all of this, it’s fair to consider the reality that leaning into this philosophy will result in many more incidents. However, consider the alternative approach. If you reserved the badge of incident for all but the most severe incidents, then your opportunity to learn shrinks accordingly.

Every single incident, big or small, is an opportunity to create better processes, build a better product, provide better services and build better relationships with your customers.

By broadening your definition of what an incident is and how impactful it needs to be before being declared as one, your window of opportunity for learning grows.

Practice, practice, practice!

Asking teams to use incident management tools they aren’t accustomed to will be tricky at first. And asking engineers to use these tools in ways they aren’t used to will also be a bit challenging.

But remember, practice makes perfect.

The best way to embed a process into your organization is to use it repeatedly.

This helps everyone learn the process, and get better at incident response overall, so when something really bad happens it feels like a well-oiled machine.

Incident management tools are useful for engineers to manage non-incidents, too

So far, we’ve highlighted why it makes sense for teams outside of engineering to use an incident management solution. But what about engineering teams dealing with non-incident issues? This is another area where incident management workflows just make sense. We’ll dive into these a bit more below, but we’ve used incident.io during big projects like infrastructure migrations and Game Days, all with great success.

“One of the unexpected benefits of switching over to incident.io is that we’ve managed to get different groups within our organization to manage incidents. For example, teams that look after our data platform, employee enablement folks that manage our laptops and other equipment our company uses.

These are teams that needed a platform to help them manage incidents. Now they’re empowered to set up incident channels and feel more confident knowing that there’s automation to help guide them the entire way.”

And it isn't just us. Our customers also use us for consequential events. These range from product launches, routine maintenance, and even load testing.

These use cases underscore one crucial fact: if you need a space to collaborate about issues, genuine or planned, incident or otherwise, using an incident management solution like incident.io is a smart bet. Are you managing a complex operational task where you want to keep track of timelines, actions, and follow-ups?

An incident management tool can help you...

Use case

Migration to Google Cloud

We recently migrated our infrastructure to Google Cloud and used incident.io the entire time. By using incident.io, we made what would typically be a really difficult and stressful task that much simpler. We were able to coordinate seamlessly, assign and track action items, and give regular updates to keep everyone in the loop–all with only minutes of downtime.

In the end, the migration went well, and using incident.io played a significant role in that. But had things gone wrong, we’d have been primed with all of the context of the migration and ready to tackle things at speed.

Use case

Product launches

Product launches can be a huge moment for organizations, and one where product, engineering, marketing, customer success, and leadership all need to work together to ensure their success.

With an incident management tool in place, engineering teams can declare an “incident” to help them navigate and communicate through each of the launch items.

Who’s pressing “Go” on the feature? Who’s in charge of managing any post-launch bugs? Who’s in charge of customer communications? With an incident management tool in place, teams can navigate these pivotal moments more seamlessly than they otherwise would.

The result? A product launch that plays out without issue.

Use case

Game Days

At incident.io, we run Game Days as a way to test our operational efficiency during incidents. During these simulations, we run planned incidents to gauge how our incident response processes are faring. These Game Days help us answer questions such as, “How efficient are our processes?”
“Are there any areas we feel we can improve or streamline?” “Does this feel right?” and more.

To keep everyone on their toes, we’ll also throw realistic curveballs to simulate how quickly things can change during real-life incidents.

Of course, we’re using incident.io every step of the way to track statuses and follow-up items, communicate with team members and more.

The case for incident management beyond engineering teams

While there are plenty of use cases for an incident management tool for engineers outside of core incident response, these tools can be a key part of other teams as well.

Customer support—on the front lines of incidents

Your customer success teams play a very important role at your organization—but they can also play a crucial role on the front lines of incidents as well. While we aren’t here to tell you that CS will suddenly be a team full of incident leads—they can and should be encouraged to declare incidents sensibly.

These teams are already fielding loads of requests from customers: bug fixes, feature requests and so on. Instead of chucking these into an issue tracker, like Jira or Linear, and hoping someone gets to it eventually, they can just declare an incident instead.

But it goes beyond that. Since customer success teams are often on the front lines, they have the most accurate understanding of any issues at hand—remember, they’re talking to customers every single day. Because of this, they have a ton of crucial context that others won’t, making them the perfect folks to both declare incidents and manage external comms, with a status page for example. Being able to work hand-in-hand with engineers during incidents is crucial to practicing good transparency and keeping everyone in the loop, internally and externally.

By encouraging CS to declare incidents—and work with engineers to manage them—your response teams can resolve issues that much faster.

An incident declared by one of our Customer Success team members, Lucy Jennings

Executive teams—ultimate visibility

While no one likes to deal with major incidents, the reality is that they do happen. When incidents of this severity strike, it’s likely that your executive team will want to be in the loop about its status and any progress towards resolution.

At the same time, responders don’t want to have to drop everything they’re doing—responding to an incident in this case—to give executives the updates they need to make consequential decisions that affect everyone.

With an incident management solution like incident.io, keeping tabs on incidents becomes that much easier for relevant stakeholders. Executives no longer need to chase anyone for a status update. No more tracking down disparate Slack channels, some of which may even be private!

They can just drop into an incident channel, or be added to one automatically via a Workflow, for incidents of certain severities or ones that affect certain surfaces. Incident responders can even share when folks can anticipate another status update, whether it be every 15 or 30 minutes, and then be nudged if they don’t share that expected update.

And all of this can happen without a hitch. Executives don’t have to chase the context they need and responders can focus on resolving the incident. A win-win!

An incident declared by one of our co-founders, Chris Evans

Security teams—managing breaches or security events that much better

Mistakes happen. Even worse, breaches happen. Regardless of which situation you’re dealing with, whether it’s an information leak or a misplaced laptop, using an incident management solution works well to track all associated actions and follow-ups.

These are situations that require multiple steps and, oftentimes, several stakeholders. With a single place to manage tasks and communicate, resolution times can shrink and crises can be averted entirely.

And because, by nature, security incidents can be highly sensitive, you may want to limit who gets looped in and when. With a solution like incident.io, not only can you respond to security incidents much more seamlessly, you can also keep them private if needed. Peace of mind, privacy and security for everyone involved.

An incident declared by one of our co-founders, Pete Hamilton

RevOps—happy customers, healthy dashboards

Revenue is always top of mind for businesses. But if your financial dashboards are inaccessible or have errors, it can be difficult to get an accurate picture of how your business is doing. At incident.io, both our Data and RevOps teams declare incidents any time frequently accessed dashboards, like the ones that show our ARR, are affected.

This ensures the rest of the org, from Sales to C-suite, are in the loop and know that someone is taking care of the issue.

An incident declared by one of our Data team members, Matilda Hultgren

Incidents truly can be for everyone

In the end, we want companies to fundamentally reimagine the way they think about incident management tools like incident.io.

They’re much more than just something for engineers to worry about. And they can have legitimate use cases outside of incident response, many of which we outlined above. But there’s still so much potential.

The reality is you’re already dealing with situations every single day that would benefit from process improvement. When you democratize the use of incident management across your organization, you’ll save time, be more efficient, and ultimately protect your customers better.

Incidents are for everyone. Let’s start treating them that way.