Article

The importance of psychological safety in incident management

Picture of incident.ioincident.io

When an incident strikes, it often brings a whirlwind of stress for everyone involved—from the teams directly handling the issue to the stakeholders making crucial decisions. Imagine support teams on high alert, customers anxiously awaiting resolutions, and executives probing for answers to steer the company through turbulent times.

This mounting pressure can make a challenging situation nearly unmanageable, especially when faced with problems that are new or unexpected.

Under such conditions, the importance of psychological safety becomes starkly apparent. Psychological safety, or the assurance that one can speak up, ask questions, and admit mistakes without fear of repercussion, is critical for effective incident management.

Without this foundational safety, not only does the stress intensify for those directly managing the incident, but a domino effect also threatens to destabilize other aspects of organizational health.

Here, we’re going to talk through the importance of psychological safety in managing incidents.

What is psychological safety?

While it may sound like a term lifted straight out of a Psychology 101 university course, psychological safety does have an important role to play within engineering teams.

But first, some context.

Invented by psychologist Carl Rogers in the 1950s, the term first made its rounds in industries such as manufacturing and aerospace. As time went on, it started to be applied more generally as a way to better promote organizational learning.

Eventually, the term was popularized by Harvard professor Amy Edmondson in her book Right Kind of Wrong: The Science of Failing Well, who defined psychological safety as:

A belief that one will not be punished or humiliated for speaking up with ideas, questions, concerns, or mistakes, and that the team is safe for interpersonal risk-taking.

If you’re reading this and thinking back to some uncomfortable moments, you wouldn’t be alone. Far too often, folks are judged and criticized for incidents, leading to an environment where they’re fearful to speak up. Instead of focusing on doing their best work, they worry more about not messing up.

How psychological safety fits into incident management

Above all, a lack of psychological safety is completely at odds with a culture of incident response that promotes deep learning and blamelessness. If responders are being expected to do their best work, psychological safety has to be a value instilled from day one.

Put bluntly, the impact of ignoring this can be a significant blocker to good incident response.

Being afraid to make mistakes

If individuals are used to being singled out as the cause of an incident, you’re going to have challenges across the whole incident lifecycle. Whether it’s people underreporting issues, feeling too anxious to go on call, or actively sidestepping important but high-risk work, it’s not a great place to be.

Unfortunately, this practice is still relatively commonplace, particularly if teams are accustomed to looking for singular root causes rather than understanding the (often many!) contributing factors. The former, whether intentionally or not, tends to single individuals as opposed to a broader system as the reason for an incident.

Ultimately this can lead to blame, which isn’t conducive to learning or morale.

Root cause vs contributing factors

The idea of root cause plays a significant role in this conversation. While it's a method that aims to zero in on the specific triggers of an incident, it's a bit of an old-school approach, primarily because looking for root causes tends to lead to shallower incident analysis.

And, as a result, you might end up considering the person pushing the change to production as a “root” cause, but looking deeper will reveal many contributors that allowed it to happen.

In short, it’s not a “root” at all, and what could have been blame directed at an individual might be better considered a problem with a process that allowed the issue to take place.

Today, we think it makes sense to look at contributing factors, some of which will be things people did or didn’t do, and some which will be environmental, process, or systems related.

Not wanting to get involved

Effective incident response depends on team members who are empowered to propose solutions without fear.

Ensuring psychological safety—whether during regular work hours or on-call shifts—is crucial for making responders feel confident in their approach to problem-solving.

However, if the workplace culture punishes individuals for not resolving issues swiftly, it can dampen overall enthusiasm and commitment to the incident response process.

Moreover, in the midst of an incident, it is common to implement quick fixes that may not be ideal but are necessary at the moment. If team members face repercussions for these emergency measures, it may cause hesitation in future situations, with responders delaying action in search of "perfect" solutions.

Asking questions

The same goes for asking questions.

Particularly for new joiners and folks with less domain knowledge, questions are a critical part of continuous learning and development. If there’s precedent for making folks who ask questions feel lesser-than, you’re creating a culture of shame and, ultimately, a more mistake-prone team.

What happens when you make more mistakes? You have more incidents that were wholly preventable.

How you can promote psychological safety

Creating a "psychologically safe" organization isn't something you can implement overnight.

It takes time and effort to establish this kind of culture. But when everyone feels seen and supported, your organization can truly thrive. Here’s a few things you can do to get one step closer to this.

Make it clear that it’s OK to not have the answers

First up, asking for help!

Just because someone is deemed ready to be on-call, or in an incident handling rotation doesn’t mean that they will always have the answers. Not knowing how to debug, diagnose and resolve an issue is a totally normal and expected part of the responsibility.

If someone brings something up, listen to them. You want to create a culture where responders feel like they can bring their best selves to work every day, ideas and feedback included.

Making it clear that this is OK is crucial to building an environment where folks feel comfortable and supported. Part of this is making it clear what the next steps are for escalations.

Anyone responding to incidents shouldn’t not be afraid or dissuaded from paging folks when they don’t know how to address it.

Doing this creates safety nets for everyone and ensures that incidents get resolved by the people who are best equipped to respond to them, and no one is shamed for not knowing the answer.

Look at the systems, not the individual

For teams that are accustomed to root cause analyses, trying to adopt a contributing factor framework can really help here. When trying to figure out how to prevent repeat incidents, hone in on what systems, processes or infrastructure allowed it to happen in the first place.

Facilitate open communication

If someone brings something up, listen to them. You want to create a culture where responders feel like they can bring their best selves to work every day, ideas and feedback included.

Especially during forums like post-mortem meetings, encouraging an open dialogue can go a long way towards making everyone feel acknowledged—and ultimately, allow you to build a better, more resilient product.

Encourage asking questions

While asking questions can be awkward at times, it really can make a big difference.

Chances are, someone else was thinking of the same question but were also too afraid to ask. Leading by example helps a lot. Lean on your more tenured engineers to lead the charge here. If folks see that tenured employees are openly asking questions, they are that much more likely to ask questions themselves.

Psychological safety goes a long way toward helping your team feel supported

Creating a culture with psychological safety is not something that happens in the background—it’s an ongoing, iterative process that’s truly never finished.

But that doesn’t mean it’s not worth the effort!

By putting systems and processes in place to ensure that all of your teams feel supported, you enable them to do their best work. When people know that their mistakes won’t be held against them, that they can ask questions when necessary, and that they can ask for help without repercussions, you remove an undue burden on them.

Everyone comes to work trying to do their best, and when you create a culture of psychological safety, you allow them to do just that.

Move fast when you break things