Article

Where does the time go after you resolve an incident?

We were curious: once an incident is over, how long does it take companies to document, review, create learnings, finish clean-up items, and complete any other follow-up action items?

We work with a wide variety of companies, from small start-ups to Enterprises with thousands of engineers. But we wanted to know: where is their time spent after they resolve an incident?

Here’s what we found!

Follow-ups

When an issue is resolved, and you get a moment to breathe, there are likely things you learned along the way that you need to do to ensure it doesn’t happen again, improve your product, and help resolve the same issue quicker in the future. These follow-ups can be time consuming, and some need to be completed in a timely manner so your team can get back to planned work.

We looked at 14,000 follow-ups across our entire incident base of real incidents to see how quickly these get completed.

What did we learn?

Follow-ups take seven days.

  • Overall, the median amount of time to complete a follow-up action is 7 days.

Smaller companies complete follow-ups quicker.

  • On average, customers with <100 employees are slightly faster at completing follow-ups, with a median of ~5 days

Those who are fast are really fast...

  • The 25th percentile time to complete follow-ups (i.e. this would put you in the fastest quarter of users) is under 1 day!

...but those who aren’t fast are very slow.

  • There is a big variation between those who do it 'well' vs. 'okay.’ On the other extreme, the 75th percentile is about 3 weeks.

The size of the company matters.

  • 'Medium’-sized companies (250-500 employees) are the slowest at completing follow-ups, consistently taking 10-15% longer to complete them than small or large companies.
    • Why? With medium-sized companies, there is more of an issue around coordination and process. Smaller companies have closer-knit teams with individual users who have more autonomy, less technical overhead, and fewer incidents, and larger companies likely have invested in better processes and more resources to facilitate follow-up completion. In short, we can chalk this up to growing pains.

To summarize, if you work in a medium-sized company and you sense you’re not getting those post-incident fixes deployed quickly enough, you’re not alone. The good news is having great tooling (like incident.io 😉) can help, with the post-incident dashboard for visibility, follow-up policies for defining what good looks like, and nudges to remind folks to keep them front of mind!

Post-incident tasks

In other words, housekeeping. No one loves to be given a to-do list or to spend hours documenting, but we’ve worked really hard to help our customers focus on only the important things to do so they can learn from incidents. What’s important varies by customer, and they can have a variety of tasks like:

  • Ensure you have accurate timestamps for events, like the incident start time or when it was resolved, so you can determine how long it took to identify the problem.
  • Create a really great postmortem document for your team to review learnings.
  • Fill out some important metadata to identify and react to the patterns.

While every company constructs its post-incident flow of tasks differently, there are some common themes we wanted to examine to see how much time was spent on these tasks. We examined around 13,000 real incidents that entered our post-incident flow of tasks.

What did we learn?

Post-incident tasks take one day.

  • The median time to complete all post-incident tasks is under one day, from when it enters the post-incident flow to when the incident is closed.

Not all tasks are created equal.

  • The time spent on a task varies a lot depending on the task type. Some tasks are quick (like adding some metadata or setting a timestamp), while others involve more cognitive energy and time (like reviewing the timeline to tell the right story or listing what the team learned from this).

People care about how their follow-up action items are handled.

  • It takes about 3 hours to review follow-up action items to ensure they are assigned appropriately and have an accurate priority associated with them. You may have one follow-up action item or ten, but the role of the person taking on this task is to ensure these get handled appropriately and by the right teams, so putting in this amount of effort is no surprise.

It’s quick to get your incident data accurate.

  • To have an accurate account of your incident, you want to review your summary, curate your timeline, fill in custom metadata, and verify that your timestamps are filled out correctly. Very intentionally, we’ve made it easy for folks to keep these up-to-date during the active phase of an incident lifecycle and easy to review them after it’s resolved. We were pleased to find that we’ve done a great job here. It takes companies under an hour to review and ensure their incident details are accurate.

Understandably so, postmortem documents are verifiably the longest consumer of time.

  • It takes an average of under 2 hours to create a postmortem. We know that thanks to our automatic postmortem generator, the time to generate a document with all the relevant information is less than 10 seconds. So the average of 2 hours here is due to the human element of having a task on the plate that they are “about to do” and not the time it takes to generate the document.
  • On average, it takes about 4 days to complete and share the postmortem document
    • This is because incidents have a human learning element. Once you have all the information at your fingertips in a document, it takes someone with organizational and technical context to synthesize it into “what went well” and “opportunities to improve.” No amount of AI will replace how it felt to be in the incident, define true financial impact, or understand how your customers reacted to the issue they faced. A person still needs to spend some time at the 50,000-foot view of the assembled information and add that to the postmortem document. This is where the real learning happens, and it is time well spent!
    • That said, we want to speed this process up! This quarter, we are exploring AI-generated learnings. This will reduce the time burden by presenting some options for your incident commanders to consider as they prepare their postmortem review.

Once again, those that are fast at completing their postmortems are really fast.

  • The 25th percentile completes the postmortem within 24 hours! As a best practice, the sooner you can complete your postmortem document, the better. The human memory is a fickle thing. The more time passes, the less likely you are to recall crucial learnings or track pertinent pain points.
  • The 75th percentile takes just over a week to complete the postmortem

Getting the right people in the room to discuss your findings is a breeze.

  • On average, folks schedule a debrief meeting to review the postmortem learnings in under a day. We offer quick scheduling, Google calendar integrations, and the auto-adding of the correct participants based on your set criteria. But depending on the size of the company or teams involved, a human coordinating other humans is still sometimes needed.

This is just the tip of the iceberg of what we’ve learned about where users spend their time in the post-incident phase. And we can’t wait to dig deeper and keep helping organizations learn from their incidents….quickly!

Share on
Picture of Eryn Carman
Eryn Carman
Customer Success Manager

Move fast when you break things