Value of incident data
While we've covered how to learn from incidents in isolation, there's a goldmine of data to be found in analysing incidents on aggregate.
By measuring several dimensions of incidents over time, and tagging your incidents to add structure to the data, it becomes possible to:
Track how much operational work incidents are generating, and where those incidents are coming from.
Identify patterns and trends such as whether incident load is increasing that might not be apparent on a per-incident basis.
Aggregate at different levels
Understand the impact of incidents on your organisation, all the way from business divisions to the individual employees.
Why should I care? #
When incidents occur, team members are required to respond alongside their normal responsibilities, and the time commitment – while hard to measure – can be significant. And if an individual or team is impacted more than most by increasing incident workload, it's important to proactively address the issue: otherwise you might see product deadlines slip, and employee happiness fall.
If you're in leadership or manage a team, keeping up-to-date with incidents can give valuable insight into team health. Since incidents are often handled by people closest to the day-to-day, for senior staff further from the action, incident data can be one of the most direct and honest signals you can get for how things are going.
Is MTTR not enough? #
Businesses have traditionally tracked incidents through measures like mean-time-to-respond (MTTR) and similar averages. While easy to calculate using spreadsheets, research from companies like Google (see MTTR is a Misleading Metric) show these measures are too shallow to give real insight into incident performance, with MTTR being representative of the worst outliers rather than reflecting real performance.
With the advent of tools like incident.io which track and measure much more of the incident process without adding friction, we can go beyond measures like MTTR to find genuinely useful data that can be used to improve incident performance or team health.
Read on to see what is possible now we have this data, using data from incident.io's own incidents for each example.