Make observability reliable: Register now
Make observability reliable: Register now
You've just made it through a particularly tough incident.
It was a short outage affecting a subset of customers, so not exactly the end of the world, but bad enough that it involved multiple people across a number of teams to resolve. Either way, the incident was well managed, and the dust has settled.
Now what?
Most guidance would say that putting together a post-mortem document is a good idea, given the severity of the incident. You've also done this, so what's next?
It's time to prepare and run your post-mortem meeting! But it isn't enough to go into a room with the people involved and wish for the best. No, there are a few things that you need to keep in mind before, during, and after your post-incident meeting to get the most out of it.
An incident post-mortem meeting is held after an incident. This is a space where a small group of responders and other stakeholders discuss the incident, its contributing factors, and its impacts.
The goal is to go beyond the post-mortem document and discuss the incident in a live setting. It's much less reactive and focused on creating better long-term outcomes.
While we're on the topic of post-mortems, it's worth diving into the concept of blamelessness since you'll hear it coming up a lot. If you're looking for a deeper dive, we wrote extensively about the concept of blameless post-mortems here.
The thinking is that incidents and any post-incident activities should be free of finger-pointing. Instead of focusing on the "who" behind an incident, blameless culture calls for understanding “why” and “how,” focusing on the many causes of incidents instead—or contributing factors.
By doing this, you eliminate the anxiety and stress around post-incident processes.
To facilitate a productive post-mortem meeting, it's important to establish a few ground rules. These can help promote open and honest discussion, give everyone equal time to raise any concerns and prioritize addressing critical issues.
There’s a myriad of ways you can run a post-mortem meeting, but let’s look at some best practices that most folks agree are a good starting point.
First things first—it's important to include all relevant stakeholders in the post-mortem meeting, including any incident responders, key decision-makers, and individuals directly involved in the incident. The goal is to make sure that every perspective and nuance is accounted for. Remember, everyone will have a unique take on the incident worth factoring in.
Before the meeting, set a clear agenda outlining some of the objectives and topics you want to discuss as a team. This helps keep the meeting focused and ensures that all agreed-upon talking points are covered.
It doesn't need to be a rigid agenda, and it's entirely expected (and encouraged!) that you'll end up talking about other topics, but going in with some pre-work and thoughts on areas of interest can be incredibly helpful.
Your post-mortem document is a good place to start!
During the meeting, the incident should be reviewed from end to end, including the timeline of events, the impact on the organization, and the actions taken to mitigate the incident. This allows the team to gain a complete understanding of what happened.
Discussion should be actively encouraged during the review. Remember, the goal is to learn, not recite the contents of the document. If people are less forthcoming with contributions, it can be helpful to ask open questions to gather additional perspectives.
Next, try to identify any contributing factors to the incident. This involves analyzing technical issues, human error, or process gaps. By identifying these, teams can implement process and technical improvements to help prevent similar incidents in the future.
The post-mortem meeting should provide an opportunity for open and honest discussions about the lessons learned from the incident. This includes identifying areas for improvement, discussing best practices for resolving incidents like these, and sharing knowledge among team members.
Based on the insights gained from the meeting, a follow-up plan should be created to address the identified issues and prevent future incidents. This plan should include specific tasks, owners, and timelines.
It can be tempting to create long lists of actions in a post-mortem meeting. At the point of reviewing this incident, these actions are likely to feel incredibly important–this is recency bias at play!
To counter this, we’d encourage you to avoid committing to actions until the owning team has had time to understand the plans and weigh up the relative importance against other planned work.
Here at incident.io, we run a debrief meeting (our name for a post-mortem) for every major or critical incident that we respond to. We also occasionally run them for lower-severity incidents too, at the discretion of the incident lead or by request from anyone involved.
For less severe incidents, it's common for the lead to write the postmortem and run the debrief. For bigger incidents, someone else, like a tech lead, will run the debrief. This frees up other key players to contribute to the conversation.
We also record the meetings using Google Meet, so they’re available for future reference and to share with anyone interested. This is a great way for folks to benefit from hearing the conversation in the room, without needing to inflate the invite list.
To make it easier to prepare for the meeting, we use the post-mortem template built into incident.io, which pulls specific information from the incident into a debrief document that we use to steer our conversation.
That said, we don't require a super polished post-mortem document before running the debrief. One of our company values is to trust by default, so we let folks choose what’s important to include and go from there.
When going through the meeting, this is the structure we usually stick with:
We're good at stopping ourselves from going into rabbit holes or overrunning during these meetings. If it looks like something needs to be investigated or a particular problem needs to be solved, we'll create a follow-up for someone to keep the session focused on learning.
But we also make sure that we aren’t ever “overcorrecting” either. If we’re setting any follow-up actions, we aim for them to be proportionate to the severity of the incident.
To round things out, we’ve compiled a few tips for running post-mortem meetings from two of our incident responders here at incident.io, Milly and Sam.
In the end, what we find most important is having the space to discuss the incident at length. The goal is always to come out of the meeting with a shared understanding of how an incident happened and an agreement on whether there are any follow-up actions we need to take—and who’s doing them.
For us, this is what a successful post-incident meeting looks like.
Incident post-mortems are a crucial document that cannot be glossed over. In this article, you’ll find our go-to post-mortem template that you can use in your own organization.
Post-mortem documents are a great way to facilitate learning after incidents are resolved.
While blameless post-mortems are a great idea on the surface, if taken to the extreme, they can muddy how much you actually learn from incidents.
Ready for modern incident management? Book a call with one our of our experts today.