Webster's Dictionary defines 'incident' as... just kidding, I'm not going to do that to you.
But really, in a world where things go wrong in organizations all of the time, what should our threshold be for defining an incident?
Unfortunately, there's no hard and fast rules that apply to every organization. That said, here's a few questions you can use to diagnose whether the particular problem you're facing is an incident.
Does the problem you're facing have a risk of some negative impact to the product, business or customers?
I hope it goes without saying, but incidents generally aren't a positive thing. Someone, or something, might be negatively impacted in a small or large way.
Do you need to respond to the problem urgently, outside of your normal operational processes?
Incidents require some urgent action to be taken to investigate and remediate. They aren't deferrable in the same way as a normal task or a bug might be.
Do you need to coordinate between multiple people or departments?
Incidents are generally not something one person can reasonably address end to end without involving anyone else.
Do you need to communicate to the rest of the organization, or to your customers?
Incidents often require other stakeholders to be kept in the loop with periodic updates on status. These might be your team, your board of directors, or your customers.
Is this something you want to discuss afterwards and review to extract learnings?
After an incident, you'll generally want to dig deeper to learn more, and see how you might prevent this in the future.
OK, but can I have some examples please?
Sure you can! As with most things, context is everything, so read these as guidelines rather than hard and fast rules.
Things we'd call incidents
- Not enough food delivery riders being on shift, and ETA times spiking as a result is an operational and product incident.
- Your largest customer threatening to churn unless you re-negotiate their contract is a customer success incident.
- An ex-employee threatening to maliciously leak confidential information about the business is a security incident.
- A customer support agent sending data to the wrong customer is an operational and privacy incident.
- A recurring error affecting a single customer when they attempt to pay for their basket is an engineering and product incident.
Things we wouldn't call incidents
- A minor CSS formatting issue affecting users on a tiny percentage of browsers. It has a small negative impact on a very small number of users, but doesn't require urgent response and you can prioritise it against other work.
- An employee resigning from the business. Although it has negative impacts, it's an expected normal business flow and doesn't need to be responded to urgently.
- Someone dropped a glass in the office. This requires urgency in response to clean it up so others aren't hurt, but doesn't require coordination, communication or systemic improvements.
Lowering the bar for incidents
Organizations generally set their threshold for incidents too high, where only the most severe events are called incidents. We believe smaller incidents are extremely valuable, and there's significant value to be obtained by lowering your threshold for an incident. Smaller incidents are a great way to learn about the failure cases of systems and provide an opportunity for teams to practice response to larger issues.
When the cost of declaring an incident is low, there's little reason to avoid reporting and plenty of value to be extracted. Give it a go!
Image credit: Emily Morter