In the fast-paced world of software development and product delivery, incidents are often viewed as unwanted disruptions.
Traditionally, incident management might only trigger for critical issues, like complete system outages, data loss of some kind, or security-related ones - you don’t need to go back that far for a few that were very serious: Heartbleed, xz utils, and more.
There has always been the idea that we should minimise the number of declared incidents - that’s seen as a good thing - it avoids alarming stakeholders, and also maintains this appearance of stability.
I believe this perspective has to shift.
The concept of “shifting left” has already gained traction in various domains, like security and observability, where people promote the idea of addressing potential issues earlier in the development cycle.
An example of this "shifting left" in security is having product managers and engineers involve security teams and think of security themselves during the scoping stage, not when deploying to production.
A “shift to the left” is beneficial for an organization’s growth and resilience.
When I say “shift left” I mean lowering the threshold for declaring incidents so that you become a lot more proactive rather than reactive. Depending on your maturity, I would go so far as to say you should declare an incident for any unexpected behavior that deviates from normal operations.
You go from dreading incidents to embracing them as opportunities for improvement.
I understand the reluctance within organizations to declare incidents. Concerns range from worrying about desensitizing team members to the seriousness of incidents, to the daunting meeting where you have to justify the frequency of incidents to upper management.
This is all compounded by the fear of damaging your company’s reputation. “What if people perceive our many incidents as a lack of control, or much worse, competence?”
A shift to the left has already brought many advantages to other domains, namely continuous delivery, security, and observability. Let’s see how doing the same in incident management will also bring many advantages.
Using avoidance as a strategy usually doesn’t pay dividends. By lowering the threshold for what is considered an incident, and bringing it to the masses, I believe you’ll raise your company’s operational readiness and resilience. It’s similar to pulling the Andon cable - anyone in a Toyota factory can pull the Andon cable and halt production until a solution is found.
So, what changes when you shift left on incident management?
I had an "aha" moment when one of our engineers made a change to trigger incidents when we stopped being able to deploy to production.
Lisa describes in detail what happened in this lovely LinkedIn post. She broke the build on our main codebase, which meant that no one could deploy it to production. Because we had hooked our CI pipeline to our HTTP alert source, which triggers an incident, and configured a workflow to escalate the incident to the person who merged the PR that is failing, she immediately got paged. We’ll save many developer hours in the long run, through this kind of fast action, not to speak of the learnings we’ll get once we get to reviewing the incident.
You need to declare more incidents.
Let's move beyond the fear of having too many incidents and instead learn from the valuable data they provide to build stronger, and more resilient systems. It requires a cultural shift away from fear and avoidance, towards openness, curiosity, and learning.
By embracing a broader definition of incidents, organizations can unlock valuable insights into their operations, fostering a culture of resilience and adaptability. In doing so, they are not merely preparing for the future; they are actively shaping it.
I’m very proud that we're here and it feels like we’re living in the future.
Moving fast does not happen by accident. Here is some of the intentional things our engineers do to move so quickly!
Cutting through the hype and dollar signs, why should you actually join incident.io? And also, why might this not work for you
In the past year, we've reimagined how we build AI products at incident.io, moving from simple prompt based features to now building full-blown AI-native systems end to end. Learn why we’re hiring AI Engineers, what that work looks like, and how it’s changing the future of incident response.
Ready for modern incident management? Book a call with one our of our experts today.