Picture the scene. You’re the head engineer at a Formula 1 racing team, and moments away from the start of a race when a minor mistake by your driver sees your car damaged on the way to the grid.
You're faced with a decision: take your car back to the garage and face a significant penalty, or attempt to fix the car from the grid faster than you’ve ever done it before, knowing that failure will likely mean withdrawing.
This is the situation the Red Bull racing team faced in the Hungary 2020 Grand Prix, and what happens next is a glowing example of effective incident response.
What’s more, and unlike many of the incidents we face on a daily basis, this one can be watched back in glorious HD, and it’s well worth 8 minutes of your time.
Assembling a team in seconds
One of the common patterns we see in incidents is the formation of ephemeral teams, where people who don’t regularly work together find themselves in an environment with new faces, unfamiliar challenges, and higher than usual pressure.
In the Red Bull incident, the driver is involved and first on the scene. Alongside a host of automated telemetry, he reports what happened, and what he’s seeing on the ground. On the other end of the line, the Sporting Director is back at the pits and immediately assumes overall responsibility. In the seconds that follow the driver and director establish a common understanding of the problem and the decision is made to fix the car from the starting grid. The mechanics in the garage are briefed and the team is sent to assemble on the grid.
At this stage, we’re about one minute in and we have someone leading, an understanding of the situation, and a team forming at the grid where this incident will be managed.
Procedures under pressure
It’s common to find yourself outside your comfort zone during an incident. Whether it’s dealing with a system in a degraded state, or following a process you’ve never followed before, incidents have a tendency to throw surprises your way.
With 20 minutes to race start, the team is attempting to fix the broken car – not from the garage as normal – but from the starting grid, surrounded by other cars and people. This is unfamiliar territory where they’re required to ferry the tools and parts they need between two separate locations. There appears to be some ambiguity in what’s required and where it's needed, so the director steps in to clear things up:
🧑🏼🚒 "Who is on garage?"
🧑🏼🔧 "I’m on garage. I’m on my way [with parts]"
By clarifying responsibilities, everyone in the incident knows who to go to for what. Shortly after, the lead reminds everyone involved that they have standard procedures for this — providing a gentle nudge — whilst also remaining empathetic to the pressures of the folks dealing the issue.
We have procedures for requesting things back and forwards. I think most of those are being followed, but just in case it’s getting confused...
What’s great here is that the lead doesn’t throw procedures at the responders. There’s no sense of “you’re not doing it right” or “you’re not following the book”. They simply remind people of the process so they can use it in the incident, and then jump in to relieve pressure on the situation. We can see the impact this has in the remainder of the incident when we observe the change in communication style:
👩🏻🔧 "Copy, we’ve got the [list of parts] on their way"
🧑🏼🔧 "We’ve got [the part] now… so that’s going on"
Self organization and communication
We can't expect a lead to be responsible for every action that needs to take place in an incident. Everyone has a finite breadth and depth of knowledge, and we want to draw on the relative expertise from everyone involved to get things back on track. We want people to communicate what they're doing so everyone has the fullest possible picture of the incident, which we see in the Red Bull incident with freqeunt “heads-up” messages like this.
🧍♂️ "I’ve told Iggy from the FIA what we’re doing"
Another good pattern in self-organization is the communicating intent with the option for others to object. In this scenario that might look like: "Unless there's good reason not to, I'm going to let the FIA know what's going on here". Which option is entirely context dependent, but if there's doubt as to whether an activity is the right one, biasing to communication before action is usually a safe bet.
Adaptability in response
Midway through the Red Bull incident, the lead and chief mechanic discuss how long the fix is likely to take, both demonstrating with a shared understanding of the deadlines they’re up against.
🧑🏼🚒 "What’s the ETA?"
🧑🏼🔧 "Probably longer than what we’ve got, but we’ll give it a go"
🧑🏼🚒 "We’d have to have done it faster than we’ve ever done it"
This is a lovely example of resilience and adaptability in incidents! The team is working in an unfamiliar place, with time pressures greater than they’ve ever managed, and with a high degree of uncertainty, and yet they remain calm and focused on resolving the problem.
Assembling the bigger picture
With 17 minutes to the race start, a lot has happened. People are self-assembling around the problem and picking up actions in their respective areas. The right things are most likely happening, but it’s the lead’s role to join the dots and make sure this is the case. Aggregation of the available information is crucial in good decision making, and we see the following request from the lead.
Let’s focus on getting some information to me. I need to know how long those bits will take to change
Soon after we see a brief discussion between the lead and the engineer which helps them develop a shared understanding of the deadlines they’re working towards. Note here that the lead does not put the deadline on the team; it’s something they establish together:
🧑🏼🚒 "At what point do we abandon this so we can comfortably get the car off the grid? Do we need 4 minutes to get the car off the grid?"
🧑🏼🔧 "Yeah about that"
🧑🏼🚒 "Then you know your deadline"
Many of us will have found ourselves in incidents, deep in focus, and being distracted by someone asking for a status update. It’s not an unreasonable situation; incidents are founded on communication and information flow, so update requests are to be expected. So what’s the solution? Asking empathetically, in a way that acknowledges you’re aware of what they’re dealing with. At 14 minutes to the race, we see a perfect of example of this where the lead is looking for a progress update from the team.
I need to know your ETA on having that car repaired… have you got the presence of mind to be able to talk to me?
High frequency comms
In the final minutes of the incident, we see a flurry of activity from the people involved in the incident. It’s not uncommon for incidents to have pinch points, and increasing the frequency and detail of comms to keep everyone aligned can help to avoid complications.
🧑🏼🔧 "Wheel on lads, wheel on"
🧑🏼🚒 "30 seconds until wheels need to be fully fitted"
🧑🏼🔧 "Still over 20 seconds, plenty of time"
🧑🏼🚒 "Can I confirm, all four wheels are fitted?"
The lead wants the team to be fully aware of the time pressures, and the engineer wants the mechanics to remain calm whilst finalising the fixes. The overall status is clear to all involved.
In just under 18 minutes, Red Bull took a car from a state of disrepair, through to full race readiness, having fixed it on the starting grid faster than they’d ever done it before. At the end of the race the car placed 2nd overall.
🏎 "I want to say thank you to the mechanics. They saved the day, you guys are legends."
Incident response we can all aspire to!
Image credit: Jp Valery
I'm one of the co-founders and the Chief Product Officer of incident.io. I've spent my whole career working in engineering.