Article

Why you need an incident timeline

We get it – incidents happen. What differentiates resilient teams from others is how they learn from them: using them as an opportunity to find the biggest improvements in how they work.

Incident timelines are one of the most simple and effective tools available to you when it comes to learning from an incident. It’s vital that you ensure they’re accurate and useful, in order to make the biggest improvements after an incident.

In this article we’ll talk about why the incident timeline is so important to gain insights, and how to make the most of it.

Using the timeline to run an effective post incident process

Our take on running an effective debrief? Use the timeline as your guide, and construct the story of your incident from the key events that occurred.

We think starting a debrief by walking through your timeline is the best way to identify the key events and decisions that shaped your response.

Your goal as an incident lead is to capture what happened as vividly as possible, fuelling the conversation and taking people back to what they were seeing, thinking, and feeling.

Approaching your debrief in this way has the following benefits

  1. Getting everyone on the same page People lose focus on the incident in the time between things calming down, and when you assemble for a debrief. As the lead, your goal should be getting everyone to a place where they can provide the most useful insights from their experience of the incident. Walking through what happened gets everyone back to where they were in the incident and is a really great memory trigger to start a meeting.
  2. Identifying key discussion points Walking through what happened with a room of people involved will highlight the points at which something unexpected happened, which you can then dig into as a group. Just like rubber-ducking with code, talking through what went on is a foolproof way to identify points where strange things happened – this will usually organically lead to discussions around why things occurred the way they did.

So, you’ve made it to your debrief and you’re reading through your timeline. What next?

As the incident lead, your aim is to maximise the learnings from this process. Here’s a few key questions to focus on when looking through the events that occurred in your incident. Keeping discussion points like these in mind means you can more quickly identify important discussion points.

  1. Identifying the incident
    • How was this incident raised?
    • Could it have been identified earlier?
    • Was this something you could have been alerted for?
    • Did a customer notice it before we did?
  2. Communication
    • Were all internal and external stakeholders kept up to date at the right times with what was going on?
    • Was it clear who was responsible for communication?
    • If you’re a distributed team, were remote folks kept up to date with decisions that were made in person?
  3. People
    • Were the right people involved at the right moments?
    • Did you find yourself in a “too many cooks” situation?
    • Did it take a while for the relevant team to find out about the incident?
    • How can you ensure incidents reach the right people as quickly as possible?

Principles for a useful timeline

Timelines should tell a story backed by an audit trail

It’s really important to make the distinction in the two uses of a timeline, between the audit trail of events that really happened, and the story of the incident.

Whilst telling a story with the timeline is really important, the core information and events and when they occurred is critical to leave an accurate account of what happened to look back on.

You should never alter timestamps or the order in which things happened to help your narrative.

It’s essential to keep your incident timeline accurate, aim that in a year’s time with zero context, you could still find the raw facts of what happened and when without embellishment. Changes you make in the moment that feel like clarifications might add to the confusion when you come back to this incident later.

Things not happening is just as important as them happening

It’s easy to focus on the times when things happened. In actuality, often a lot more insights can be gleaned from gaps in the timeline. Why was there a pause in activity?

Was the incident lead unassigned, leaving next steps unclear?

Was there a long gap where comms weren’t updated?

Was there an IRL meeting that should have been written up for those not in the office?

Timelines are living documents

Incident leads often fall down when they bring the timeline to a debrief and assume it is a complete document, containing everything that went on. In reality, the debrief is the perfect time to trigger peoples memories and extend the timeline, finding pieces of nuance among the events.

As an incident lead you don’t need to bring a complete and final timeline to a debrief – your role is actually to facilitate finding the story of the incident amongst the chaos.

To do this, it’s essential to lean on others to bring their context. It’s also particularly important as a lead to run a meeting with an open dialogue for all participants. Your job is to ensure that the timeline captures what was going on for everyone involved. Making space to hear the perspective of the quieter people in the room can often be a great place to find insights you wouldn’t have otherwise.

It’s worth noting here that when analysing the timeline, perspectives from across your org are really important. Debriefs can often fall down by involving engineers only - it’s a big mistake to not search out the thoughts of Customer Support folks, or anyone else across the org who got involved.

Now I want an incident timeline, how should I create one?

Now we’re agreed that timelines are useful, what’s the best way to create one?

Your incident timeline should contain events that occurred during the incident, and the exact times at which they happened. It’s important that the timeline is a thorough representation of what happened - don’t leave events out at this stage.

Alerts, conversations in Slack or in person, pull requests, and external comms are all examples of events that should be captured in your timeline.

Here at incident.io, we automatically capture the key events that occur as they happen.

Once you have your raw timeline, it’s really important to put the work in to construct the narrative to aid better debriefs. Read more about how to tell the story of your incident in our blog on curating a timeline.

Picture of Martha Lambert
Martha Lambert
Product Engineer

Operational excellence starts here