Make observability reliable: Register now
Make observability reliable: Register now
See the full report—Incident metrics pulse: How organizations are measuring their incident management
What metrics do you look at to measure how efficient your incident response is?
This is a question we get asked all the time and one we empathize with deeply. While there are several well-established incident metrics that organizations commonly use, like MTTR and raw counts of incidents, a vast number of them are ineffective, or worse still entirely misleading.
So we wondered how teams sorted through all of this noise to figure out which metrics would deliver the most insight and impact. To get some answers, we put together a simple survey:
Here’s what we asked world-class organizations such as Etsy, SumUp, and Ramp:
Below is a quick preview of what we unearthed. But be sure to check out the full report to get the whole story.
For an overwhelming number of responders, MTTx metrics are still the gold standard for tracking the efficiency of incident response—in particular, the DORA metric MTTR.
And this just makes sense. But if you dig deeper, there's a big problem with MTTR that can ultimately give you a total misrepresentation of the actual state of things at your organization.
Are upper management and teams on the ground at odds with what makes the most sense to track? The survey responses suggest this is the case. For example, one responder mentioned that they track MTTD at the team level, but report up MTTD and MTTR. And this was a theme throughout the responses.
Much of what they report is the same, but teams will often tack on a few more metrics for management.
But why?
While the majority of teams do take action on the insights they gather from their metrics, enough responders said "no" to get us thinking: why?
There are plenty of totally rational explanations here. Time is one. The idea of tacking on a bunch of follow-up actions onto an already loaded list of work can seem daunting.
However, consider the alternative: teams just aren't finding the metrics they track all that useful.
Today, optimizing for cost is everything. The problem is that figuring out exactly how much an incident costs your organization can be a big challenge. So when asked what they would want to track if they could, several responders mentioned cost or financial impact.
Given the era of hyper-competition that we currently find ourselves in, this is going to be top of mind for many organizations across the spectrum.
You can download the full report by clicking on the image below.
We created a dedicated page for Anthropic to showcase our incident management platform, complete with a custom game called PagerTron, which we built using Claude Code. This project showcases how AI tools like Claude are revolutionizing marketing by enabling teams to focus on creative ways to reach potential customers.
We examine both companies' comparison pages and find some significant discrepancies between PagerDuty's claims and reality. Learn how our different origins shape our approaches to incident management.
The EU AI Act introduces new incident reporting rules for high-risk AI systems. This post breaks down what Article 73 actually mandates, why it's not as scary as it sounds, and how good incident management makes compliance a breeze.
Ready for modern incident management? Book a call with one our of our experts today.