Incident management vs problem management: understanding the connection between the two
Sometimes, two concepts overlap so much that it’s hard to view them in isolation. Today, incident management and problem management fit this description to a tee.
But this wasn’t always the case. For a long time, these two ITIL concepts were seen as distinct—with specialized roles overseeing each. Incident management existed in one corner, overseen by operators, and problem management in the other run by developers.
Since the same people are responsible for building, shipping, and operating their software, there’s little need for a separate name for these two concepts at all
As development and operations became less siloed through movements like DevOps, the lines suddenly became blurred.
So where do they stand today?
Most would argue that the practices surrounding problem management and incident management are so interrelated that they should be considered one and the same. But while they overlap in the court of public opinion, it’s still important to note what each entails to understand how they ultimately work together.
What’s the connection between problems and incidents?
The best way to understand the link between problem and incident management is to view them as causes and impact.
Incident management focuses on putting out fires. It's the practices and process companies employ to mitigate impact as quickly and efficiently as possible.
Problem management is oriented around understanding and addressing the underlying causes of these fires to reduce the likelihood and impact in future.
In the context of software development, imagine you have a service that’s regularly falling over and resulting in recurring incidents. Here, problem management would focus on diagnosing what’s causing failures in the first place, while incident management is the reactive effort to resolve each failure as it happens.
In general, a single problem may cause a single incident or multiple. This is why problem management is conceptually important—it aims to identify issues at source, and nip them in the bud before they avalanche into a wave of preventable incidents that strain resources.
But this doesn’t mean that incident management isn’t without its merits. Incidents are inevitable, and by focusing on a strong incident management process, teams can resolve incidents faster and more efficiently.
It’s all a balance. Ultimately, they complement one another.
How have these changed since the inception of DevOps?
Historically, incident management and problem management were seen as distinct practices carried out by entirely different teams. Developers would write and ship their code, and an Operations team would be responsible for running it.
Since the introduction of a more joined-up, “you build it, you run it” approach (call it DevOps, SRE, or whatever else is en vogue!), it’s rare to hear people talk about problem management at all.
In reality, since the same people are responsible for building, shipping, and operating their software, there’s little need for a separate name for these two concepts at all.
When an incident happens, teams will mitigate and fix things. And once the incident is over, they’ll dig into the underlying causes, too.
Of course, it takes work to put this into practice, and it’s not uncommon to see teams focus heavily on responding to incidents and forgetting or deprioritizing efforts to fix the underlying contributors.
Incident management and problem management are conceptually important and still relevant to organizations today—just not by title. The world is converging on a broader, more end-to-end definition of incident management, covering everything from the first response and mitigation through to debriefing and follow-up action completion.
No “problem management,” no problem.
The Debrief
We sat down with Chris to chat in detail about the concept of incident management and problem management.