Home/Learning from incidents

Writing a post-mortem

We currently use the term post-mortem here, despite preferring 'Incident Debrief' internally. Unless you have particularly serious incidents, the link to death here really isn’t helping anyone.

We’ve optimised for familiarity in the guide, but within the incident.io platform we allow you to configure whatever name works best for you.

A post-mortem is a structured document that captures the details of an incident, including what happened, how it was managed, and what was learned. This documentation process is essential for understanding the origins and causes of an incident, so that teams can prevent similar issues in the future.

Reasons to write a post-mortem

Broadly there are a few reasons why post-mortems are written, and it's worth being cognisant of these when you're going through the process:

  • For the people who were involved: The simplest case, where the people who were involved want to write something up to capture the analysis they've done to better understand what happened. Here, the write up might be very technical and at great depth, aimed at sharing deep learning with specialists.
  • To share with the broader organization: When we have incidents, many of the learnings are valuable to the broader organization. For example, within engineering there might be lessons folks can learn to help them prevent similar issues themselves. If you're writing for this purpose, it's important to write the narrative and level of detail to help people along on the journey.
  • To share with the public: In some cases, and typically where there's been significant customer impact, organizations will choose to share the details of an incident more publicly. In these cases, it's common practice to be more guarded about the details.
  • To satisfy an organizational requirement: Sometimes, post-mortems need to be completed to satisfy an organizational need, like a regulatory compliance requirement saying all SEV1s must have a filed document.

None of these reasons is better or worse than the other. Filing a report to satisfy a regulatory compliance might be necessary for a business to function. And if you're writing for the public, putting a more positive spin might be necessary to avoid concern. What's important is to know which you're writing and being intentional about the content as a result.

For the remainder of this section, we'll assume we're writing a post-mortem aimed at helping the broader organization understand and learn from what happened.

Deciding who should write the post-mortem

This is relatively simple: ask yourself who can help you learn the most from the incident.

Key information to include

A clear and consistent template can help make post-mortems easy to write, and easy for other folks to read and digest. There's a million ways you can write one of these documents, but this is our preferred format:

Incident Summary

A concise and complete overview of the incident, outlining what happened, how it was resolved and any key changes that have been made as a result. This section should be accessible for readers unfamiliar with the specifics, and provide enough of an overview that someone could read it and not need to go any further.

Key Information

A summary of the incident with core details, including severity, affected services, impacted customers, key participants, and links to relevant resources.

Timeline

The narrative of the incident in timeline form. This should be less of an audit trail, and more of a story that hits the key moments and turning points as the incident played out.

Contributors

Identify the factors that contributed to the incident, such as technical failures, human factors, and external events. These aren't 'causes', but things that had to be true for the incident to manifest in the way that it did.

Mitigators

Highlight any aspects that helped limit the impact of the incident, which might inform positive practices to maintain. There's usually a number of things that prevent incidents being worse, including technical controls, human adaptability and sometimes blind luck!

Learnings and risks

Summarize the key lessons you learned, and any broader risks identified, like "key person" dependencies or generalised technical issues.

If you'd like to use this format for your next write up, you can use our post-mortem template here.

Priming the incident review meeting

In some cases, the post-mortem document will form the foundation of an incident review, particularly for high-severity incidents. This meeting is an opportunity for team members to discuss the incident in detail, provide additional insights, and align on any next steps. Having a document pulled together in adavance of the incident review usually leads to a far more productive meetings.

We'll cover more in the next section!