Article

Why I like discussing actions items in incident reviews

Are incident reviews about learning or tracking actions?

This question has sparked recent debate in incident management circles, including in my recent panel at SEV0 and in this post by Airbnb engineer Lorin Hochstein. Should the goal of an incident review be learning, or should it focus on tracking actionable improvements? When is the right time to discuss actions, and are they picked up just to make us feel better?

From my experience, learning from incidents and identifying actions are inseparable. Trying to split them into separate discussions or meetings—ostensibly to “focus on learning”—often does more harm than good.

To explore this in more depth, let’s break down what a balanced post-incident review should look like.

Running an incident review

Here’s my baseline for running a healthy post-incident review. While the specifics will vary depending on the organization’s size, culture, and the context of the incident, this is—in my experience—a framework that works well for most teams.

Incident reviews are the meetings held after the initial issue has been mitigated, often within a few days. They’re typically attended by people directly involved in the incident, along with subject matter experts and relevant stakeholders. These meetings are usually cross-functional, with representation from engineering, customer support, and sometimes leadership.

In my experience, they generally shouldn’t involve people who weren’t involved in the incident, and who don’t have anything to contribute. The larger the “audience” in this meeting, the harder it is for folks to talk openly about their experience. It’s stressful enough walking through something that’s gone wrong without also feeling like you’re presenting to half of your engineering organization.

The goals of an incident review

The objective of an incident review is straightforward: to align the group on what happened, ensure everyone understands things a little better, and identify areas for improvement. Learning and action discussions shouldn’t be seen as competing priorities but rather as complementary parts of the same conversation.

Discussing action items

When I talk about discussing action items, I don’t mean diving deep into planning every minute detail. Nor do I mean carving out a specific time to talk about actions specifically.

I mean exploring improvements at contextually relevant points of the review process—improvements that might include preventing reoccurrence, enhancing reliability, or reducing future impact.

Some examples of how these conversations might arise:

  • “There was a missing step after I deleted the pod. I think we should update the runbook to make this clearer.”
  • “I don’t think anyone outside of the DevEx team group knows that. We should share it more widely in the #engineering channel.”
  • “Given the issue only affects our version of Postgres, we should prioritize upgrading.”

Statements like this naturally arise as you discuss the incident, and they provide fertile ground for further learning and refinement. This type of “action item” discussion doesn’t derail the review; it enhances it.

Learning from action item discussions

In many cases, action items noted during the review actually spark further questions that deepen the learning experience. Consider the following questions:

  • “There was a missing step after I deleted the pod. I think we should update the runbook to make this clearer.”
    • “Do we think people should be following that runbook without a clear understanding of what each step means?”
  • “I don’t think anyone outside of the DevEx team group knows that. We should share it more widely in the #engineering channel.”
    • “How is critical information like this typically shared with the wider team?”
  • “Given the issue only affects our version of Postgres, we should prioritize upgrading.”
    • “What's our process for updating this kind of software? Is there a reason we’re on such an old version?”

By discussing action items at this level of abstraction people are learning about the system and we’re suggesting improvements. Feels like a pretty positive outcome.

Addressing some commonly cited concerns

With my case above made, let’s look at some specific criticisms I hear of this approach:

  • “We should focus on learning, not action items.”
    I don’t agree with the notion that these two concepts conflict. If you define “discussing action items” as surfacing suggestions as they arise contextually during the incident review, you can’t reasonably separate them from the learning process without it feeling forced and unnatural.

    If the conversation naturally sparks an idea for how something can be improved, I’d rather someone share it rather than keep it to themselves. Side note: it’s a little like suggesting a blameless review means you can’t mention names (spoiler: you can, and it helps to do so).
  • “Actions don’t make you safer; they lead to more incidents.”
    I don’t follow the logical conclusion here. That we should never discuss how to improve systems? All systems change over time, and it’s safe to assume most changes are net-positive. The changes we make after an incident aren’t all going to be positive, but if we’re making changes with information that suggests they’ll improve the system, I can’t see a good case for not doing them.

    For me, the goal isn’t to avoid actions but to ensure they are carefully considered and balanced against other priorities. And to emphasize the importance of the point, “balanced against other priorities” means taking the actions you identify in a review, and carefully weighing them up against everything else on your plate before committing to do them. Recency bias can make the most recent thing feel like the most important thing, and it’s an important thing to be aware of it.
  • “We track action items to make ourselves feel better.”
    I’ve rarely seen this in practice. Most action items are well-intentioned ideas thought of by people close to the systems, who are aiming to make tangible improvements.

    Yes, bad action items exist, and external pressures can lead to “safety theater” where you identify items solely to appease someone else. But the solution is to refine the process, not eliminate actions. No need to throw out the baby with the bathwater.

And these ones, pulled directly from Lorin’s blog post…

  • “Action item discussions are likely to be of interest to a smaller fraction of the audience.”
    In larger incidents, where multiple teams are likely to be involved, I agree this is often the case, and I expect Lorin's background in larger organizations like Netflix and Airbnb may mean he’s seen more of this than me.

    But I don’t think this makes the case not to discuss improvements; it makes a case not to discuss the details. I think it’s generally beneficial for everyone in attendance to gain some visibility of what’s being proposed and what impact it’s likely to have. Good facilitation is critically important to avoid a subset of the group going down a rabbit hole. And when you’re in smaller groups on more focused incidents, the level of interest in details typically increases.
  • Teams are already highly incentivized to implement action items that prevent recurrence.”
    I agree, but socializing the changes that are being proposed is still helpful. Inclusion in an incident review isn’t about providing the incentives or conditions to make sure the actions happen; it’s about raising awareness of what’s being done or proposed so everyone is on the same page. And by everyone, I include people working around the systems that are being changed (i.e. to update mental models), and people who need to understand and communicate changes elsewhere (compliance reporting, updating the board, etc).

    And linking this to the point above about actions leading to more incidents, by socializing the action items in the review process, the 2nd order problems of these action items will be much more likely to surface. i.e. Platform team: “We’re going to upgrade the database to prevent this happening again” App dev team: “We can’t do that as there’s a legacy app that depends on this version specifically”. As a result, the database doesn’t get upgraded and causes another incident elsewhere.
  • “Incidents make organizations uncomfortable, and action items reassure them.”
    They do provide reassurance. But that’s not inherently a bad thing. Teams handling incidents don’t operate in a vacuum; and reassuring senior leaders, customers, and regulators is important. Tracking and communicating actions, when done thoughtfully, can be an effective way to show that progress is being made.

I’d encourage you to also read Why I don’t like discussing action items in incident reviews by Lorin Hochstein. It’s a different take on this subject, but I think there’s a reasonable overlap in philosophy.

Learning and action aren’t opposing forces

In reality, a successful incident review may have no action items—or it may have a dozen. If you’ve made the right changes before the review, there’s no need to artificially invent actions. If new insights emerge during the review, note them down.

The goal is to learn and improve—whether that’s through tracking specific actions or simply building a shared understanding and deeper knowledge of what happened.

For me, I’ve never thought of it as a question of “learning or actions.” It’s about providing an environment where the most fruitful discussion happens. Learning can happen through action, and good actions are founded on better understanding of the systems we operate in.

Picture of Chris Evans
Chris Evans
Co-Founder & CPO

I'm one of the co-founders and the Chief Product Officer of incident.io.

Move fast when you break things