You've just made it through a particularly tough incident.
It was a short outage affecting a subset of customers, so not exactly the end of the world, but bad enough that it involved multiple people across a number of teams to resolve. Either way, the incident was well managed, and the dust has settled.
Now what?
Most guidance would say that putting together a post-mortem document is a good idea, given the severity of the incident. You've also done this, so what's next?
It's time to prepare and run your post-mortem meeting! But it isn't enough to go into a room with the people involved and wish for the best. No, there are a few things that you need to keep in mind before, during, and after your post-incident meeting to get the most out of it.
What is a post-mortem meeting?
An incident post-mortem meeting is held after an incident. This is a space where a small group of responders and other stakeholders discuss the incident, its contributing factors, and its impacts.
The goal is to go beyond the post-mortem document and discuss the incident in a live setting. It's much less reactive and focused on creating better long-term outcomes.
A note on blameless post-mortems
While we're on the topic of post-mortems, it's worth diving into the concept of blamelessness since you'll hear it coming up a lot. If you're looking for a deeper dive, we wrote extensively about the concept of blameless post-mortems here.
The thinking is that incidents and any post-incident activities should be free of finger-pointing. Instead of focusing on the "who" behind an incident, blameless culture calls for understanding “why” and “how,” focusing on the many causes of incidents instead—or contributing factors.
By doing this, you eliminate the anxiety and stress around post-incident processes.
Setting the scene for a productive post-incident meeting
To facilitate a productive post-mortem meeting, it's important to establish a few ground rules. These can help promote open and honest discussion, give everyone equal time to raise any concerns and prioritize addressing critical issues.
- Set the tone of blameless accountability: First, creating an environment of blameless accountability is crucial since it allows for a focus on understanding contributing factors rather than finger-pointing anyone. Remember—if discussing the actions of a specific person is being done for the sake of better learning; don't shy away from it.
- Take notes: Second, staying on topic and taking good notes during the meeting helps maintain focus and makes capturing important details a bit easier. Pro tip: the person running the meeting should not be the person taking notes. Their focus should be as a facilitator.
- Set expectations for learning: Lastly, setting expectations at the beginning of the meeting, emphasizing learning and improvement, and promoting open and respectful discussion can help create a positive and productive environment for everyone.
General guidance for running post-mortem meetings
There’s a myriad of ways you can run a post-mortem meeting, but let’s look at some best practices that most folks agree are a good starting point.
Get the right people together
First things first—it's important to include all relevant stakeholders in the post-mortem meeting, including any incident responders, key decision-makers, and individuals directly involved in the incident. The goal is to make sure that every perspective and nuance is accounted for. Remember, everyone will have a unique take on the incident worth factoring in.
Have an agenda
Before the meeting, set a clear agenda outlining some of the objectives and topics you want to discuss as a team. This helps keep the meeting focused and ensures that all agreed-upon talking points are covered.
It doesn't need to be a rigid agenda, and it's entirely expected (and encouraged!) that you'll end up talking about other topics, but going in with some pre-work and thoughts on areas of interest can be incredibly helpful.
Your post-mortem document is a good place to start!
Review your incident, preferably using a post-mortem document
During the meeting, the incident should be reviewed from end to end, including the timeline of events, the impact on the organization, and the actions taken to mitigate the incident. This allows the team to gain a complete understanding of what happened.
Discussion should be actively encouraged during the review. Remember, the goal is to learn, not recite the contents of the document. If people are less forthcoming with contributions, it can be helpful to ask open questions to gather additional perspectives.
Note any contributing factors
Next, try to identify any contributing factors to the incident. This involves analyzing technical issues, human error, or process gaps. By identifying these, teams can implement process and technical improvements to help prevent similar incidents in the future.
Discuss lessons learned
The post-mortem meeting should provide an opportunity for open and honest discussions about the lessons learned from the incident. This includes identifying areas for improvement, discussing best practices for resolving incidents like these, and sharing knowledge among team members.
Align on follow-up actions
Based on the insights gained from the meeting, a follow-up plan should be created to address the identified issues and prevent future incidents. This plan should include specific tasks, owners, and timelines.
A note on actions
It can be tempting to create long lists of actions in a post-mortem meeting. At the point of reviewing this incident, these actions are likely to feel incredibly important–this is recency bias at play!
To counter this, we’d encourage you to avoid committing to actions until the owning team has had time to understand the plans and weigh up the relative importance against other planned work.
How we run post-mortem meetings at incident.io
Here at incident.io, we run a debrief meeting (our name for a post-mortem) for every major or critical incident that we respond to. We also occasionally run them for lower-severity incidents too, at the discretion of the incident lead or by request from anyone involved.
For less severe incidents, it's common for the lead to write the postmortem and run the debrief. For bigger incidents, someone else, like a tech lead, will run the debrief. This frees up other key players to contribute to the conversation.
We also record the meetings using Google Meet, so they’re available for future reference and to share with anyone interested. This is a great way for folks to benefit from hearing the conversation in the room, without needing to inflate the invite list.
To make it easier to prepare for the meeting, we use the post-mortem template built into incident.io, which pulls specific information from the incident into a debrief document that we use to steer our conversation.
That said, we don't require a super polished post-mortem document before running the debrief. One of our company values is to trust by default, so we let folks choose what’s important to include and go from there.
When going through the meeting, this is the structure we usually stick with:
- The facilitator walks through the post-mortem document, i.e., briefly summarizes the incident and walks through the timeline.
- The discussion happens, normally based on a set of "talking points" or "learning points."
- New follow-up actions are created during the discussion to make sure we’re applying our learnings and building better resilience.
- After the meeting, the follow-ups are completed or pushed into an issue tracker, and the notes are sent around to all relevant stakeholders
We're good at stopping ourselves from going into rabbit holes or overrunning during these meetings. If it looks like something needs to be investigated or a particular problem needs to be solved, we'll create a follow-up for someone to keep the session focused on learning.
But we also make sure that we aren’t ever “overcorrecting” either. If we’re setting any follow-up actions, we aim for them to be proportionate to the severity of the incident.
Some post-mortem meeting tips from our very own incident responders
To round things out, we’ve compiled a few tips for running post-mortem meetings from two of our incident responders here at incident.io, Milly and Sam.
- Don't invite too many people to the post-mortem meeting. If it's valuable for people to "sit in," record it instead. The more people attend, the harder it is to prompt honest discussion.
- The same goes for leadership. If you invite all senior leaders, the environment will feel more like "this really bad thing happened, and all the leaders are watching you explain yourself."
- There are two ways you can learn from what happened during an incident:
- How can we stop this kind of problem from happening again in the future?
- How can we respond to this kind of problem better next time?
- Don't wait too long to have the debrief. It's better to have a scrappier post-mortem and to hold the debrief sooner than it is to polish the doc for ages and have forgotten what happened by the time you run it. Remember, you can always edit it after the meeting.
In the end, what we find most important is having the space to discuss the incident at length. The goal is always to come out of the meeting with a shared understanding of how an incident happened and an agreement on whether there are any follow-up actions we need to take—and who’s doing them.
For us, this is what a successful post-incident meeting looks like.