Home/Incident management foundations

Statuses

If severities answer "how bad is this incident?", statuses answer "how close are we to getting back to normal?"

Statuses keep stakeholders up-to-date

The last thing a busy incident response team needs is all the different stakeholders asking "do we know what's wrong here?" or "is this fixed yet?". The correct set of statuses preempt those questions.

Like severities, statuses are the common language across your organization for making sure people know where in the incident lifecycle you are right now.

Choosing statuses that make sense for your organization

Choosing the right number of statuses is key. Too few and you'll still be fielding those disruptive questions, too many and you'll waste time deliberating whether an incident is "mostly mitigated" or "fully mitigated".

Like a child on a long car journey, the first question to think about is "are we there yet?" The most basic set of statuses are just "Ongoing" and "Resolved". That lets the rest of your organization know whether there's a known issue right now.

That's probably enough for a small team with short-lived incidents, but as your organization and product grows in complexity so will your incidents.

What's useful during an ongoing incident?

Let's start during the incident. The key questions the status needs to answer are "do we know what's wrong?" and "is the impact still happening?". You might solve that with:

  • "Investigating": we think something is wrong, but we're not sure what it is yet
  • "Fixing": we've figured out what's wrong and we're trying to fix it
  • "Monitoring": we think it's fixed, but want to double-check!

These statuses should be simple and clear enough that they make sense for responders, internal, and external stakeholders.

What's useful once an incident is resolved?

As your organization scales, you'll want to put in place processes to help you learn from incidents (we'll go into more detail about this later in this guide).

If you're trying to hold incident leads accountable for following the post-incident process you define, statuses can help you do that. You might split "Resolved" into different stages:

  • "Impact mitigated": things are back to normal, and it's time to start learning
  • "Debrief completed": you've met to discuss what can be learned from this incident, and any follow-ups have been assigned to the relevant team
  • "Closed": the post-incident process is over 🎊

In summary

When you're designing your organization's incident response process, it's helpful to think about what incident statuses will be used for. They will:

  1. Frame incident updates: do your statuses help Incident Leads send updates at the right time?
  2. Communicate with stakeholders: if all you could see about an incident was the name, a short summary, the severity, and the status, would you know whether you could help? Would executives understand the impact? Would your customer support folks be able to keep your customers updated?