We currently use the term post-mortem here, despite preferring 'Incident Debrief' internally. Unless you have particularly serious incidents, the link to death here really isn’t helping anyone.
We’ve optimised for familiarity in the guide, but within the incident.io platform we allow you to configure whatever name works best for you.
A post-mortem is a structured document that captures the details of an incident, including what happened, how it was managed, and what was learned. This documentation process is essential for understanding the origins and causes of an incident, so that teams can prevent similar issues in the future.
Broadly there are a few reasons why post-mortems are written, and it's worth being cognisant of these when you're going through the process:
None of these reasons is better or worse than the other. Filing a report to satisfy a regulatory compliance might be necessary for a business to function. And if you're writing for the public, putting a more positive spin might be necessary to avoid concern. What's important is to know which you're writing and being intentional about the content as a result.
For the remainder of this section, we'll assume we're writing a post-mortem aimed at helping the broader organization understand and learn from what happened.
This is relatively simple: ask yourself who can help you learn the most from the incident.
A clear and consistent template can help make post-mortems easy to write, and easy for other folks to read and digest. There's a million ways you can write one of these documents, but this is our preferred format:
A concise and complete overview of the incident, outlining what happened, how it was resolved and any key changes that have been made as a result. This section should be accessible for readers unfamiliar with the specifics, and provide enough of an overview that someone could read it and not need to go any further.
A summary of the incident with core details, including severity, affected services, impacted customers, key participants, and links to relevant resources.
The narrative of the incident in timeline form. This should be less of an audit trail, and more of a story that hits the key moments and turning points as the incident played out.
Identify the factors that contributed to the incident, such as technical failures, human factors, and external events. These aren't 'causes', but things that had to be true for the incident to manifest in the way that it did.
Highlight any aspects that helped limit the impact of the incident, which might inform positive practices to maintain. There's usually a number of things that prevent incidents being worse, including technical controls, human adaptability and sometimes blind luck!
Summarize the key lessons you learned, and any broader risks identified, like "key person" dependencies or generalised technical issues.
If you'd like to use this format for your next write up, you can use our post-mortem template here.
In some cases, the post-mortem document will form the foundation of an incident review, particularly for high-severity incidents. This meeting is an opportunity for team members to discuss the incident in detail, provide additional insights, and align on any next steps. Having a document pulled together in adavance of the incident review usually leads to a far more productive meetings.
We'll cover more in the next section!