Article

Synchronizing mental models

In the heat of an incident, having a clear and shared understanding of what’s going on is absolutely crucial to effective response. But often what actually happens is that people involved in incidents build their own picture and narrative of the event, shaped by their own expertise, their past experiences, and what they’re seeing and hearing as the incident develops.

The pictures and perspective people build is often referred to as a mental model. And when the mental models of people in the same incident are misaligned, it can lead to sub-optimal decision making, additional stress, and reduced operational efficiency – not ideal for any organization.

Having individual mental models is akin to having multiple navigators on a ship, each with a different map

Mental models will always play a part in how we respond to incidents, and whilst it’ll never be possible to fully synchronise our views (and that's a good thing!), there’s enormous value in being able to describe the known parts of our organizations and technical systems to provide a solid and shared foundation of understanding.

This is one of the reasons we built Catalog, which allows us to flexibly define how the entirety of our organization is structured and connected, so responders can work from the same common understanding.

The double-edged sword of mental models

Each of us develops a mental model of how systems work, built on our unique experiences, expertise, and exposure to past events. These mental maps help us navigate complex systems, but they can also act as double-edged swords.

Keeping these mental models up-to-date imposes an additional cognitive load on us, and often we enter an incident with a model that’s only correct as of some date in the past.

The extra effort of context loading and filling in the blanks, especially during the adrenaline-filled moments of an incident, can significantly slow down response times. Additionally, the associated cognitive burden can affect decision-making speed, hamper effective communication, and ultimately, lead to operational delays – a luxury we really can't afford.

When everyone operates from their individual mental models, a shared understanding or state becomes elusive. It's akin to having multiple navigators on a ship, each with a different map. This lack of uniformity can seed risks as teams struggle to interpret and contextualize incidents in the same way. The absence of a shared state can muddy the waters of communication, making it harder to agree on the most effective course of action.

Even more concerning is the potential for these mental models to be inaccurate, and when our perceptions misalign with the actual functioning of systems, it can lead to poor decision-making. Misunderstandings and flawed assumptions can result in decisions that veer away from optimal incident resolution and instead exacerbate problems.

This misalignment can be exceptionally harmful, leading to further operational hiccups or, in the worst case, critical system failures.

♻️ Rebooting the (non-)production database

I was once responding to a minor incident addressing some database performance issues. The database was only used by non-production services, and since it was an out-of-hours response, I decided to take the entire cluster offline for a full reboot.

This made perfect sense given the context I had at the time.

As it happens, because of some recent capacity constraints, the cluster had started being used by some new production services. My reboot inadvertently upgraded my minor performance incident to a full-blown outage.

If I'd had the additional context at the time, I’d have likely taken a very different course of action 😅

A shared and dynamic map of your organization

This is where our newly launched feature – Catalog – comes into play. Designed to address the challenges arising from differing mental models and increasing organizational complexity, Catalog is more than just a feature. It's a foundational building block for levelling up incident response across the entire organization.

👩‍🏫 A primer on the Catalog

The catalog is a flexible data structure which can be used to define types, which represent 'things' in your organization, and the connections between them.

The Catalog plugs into all the systems you already use, from Service Catalogs to CRMs and PagerDuty to HR systems. It's designed from the ground up to be a live view of your organisation, all in one place.

So if we had Teams and Business Services defined as types, we can model a link between them that expresses which teams own what business services. And with this defined, we can use it in automation workflows to solve actual business problems like:

When:
- Customer support raises a `Critical` incident affecting a specific business service

Use Catalog to:
- Search for all teams that are responsible for that service
- Find out who's on-call
- Page them automatically with the correct PagerDuty configuration

Catalog enables you to create a tangible, shared map of your organization. It's a manifestation of your operational landscape, covering everything from teams and software services to physical devices and business functionalities. You can link these entities to map their relationships, creating a comprehensive network that mirrors your organization's reality.

With a shared map, everyone is working from the same base. We're speaking the same language, interpreting issues in the same context, and that dramatically improves the quality and speed of decision-making. When we all see the same picture, we reduce the inconsistencies and communication gaps that differing mental models can create.

Beyond providing a shared understanding, Catalog also facilitates automated actions based on this common picture. By integrating the shared understanding of your organizational landscape into your systems, you can automate responses to various situations, thereby offloading a lot of the mental overhead.

🤖 Catalog powered automations

Catalog is opening up a whole world of powerful, unintrusive, and assistive automation for incident response:

🔀 Routing notifications
When your customers notify you that a particular business service is impacted, we can use the Catalog to find the right teams and get them in the room.

🤝 Looping-in Sales and Customer Success
If an outage affects a subset of high-profile customers, Catalog can automatically pull in the right Customer Success people to handle proactive communication.

🌐 Joining the dots in Engineering
When you’re investigating a database issue, you can quickly scan for dependencies, owning teams, responsible executives and any context from past issues.

This isn't just about reducing cognitive load; it's about enhancing operational efficiency and simplifying incident response.

The impact of Catalog on incident response

With a shared map, incident response becomes faster, more efficient, and significantly less stressful. No longer are teams scrambling to reconcile different interpretations of the situation. Instead, they can focus on what really matters: resolving the incident.

By aligning everyone's understanding of the system, Catalog promotes more effective communication and decision-making. It paves the way for more synchronized actions, and the mental energy previously spent on grappling with individual mental models can now be directed towards problem-solving.

We believe the Catalog represents a paradigm shift in incident response. By breaking down the barriers posed by individual mental models and replacing them with a shared, comprehensive map, we can make things a little easier for those involved.

Interested in learning more? Head to incident.io and sign-up for a demo or free trial.

Picture of Chris Evans
Chris Evans
Co-Founder & CPO

I'm one of the co-founders and the Chief Product Officer of incident.io.

Move fast when you break things