Further reading

Level up your incident response with incident.io

Learn more
Caring for your team

At the heart of any high performing incident response team is the ability to stay calm under pressure.

Put simply, people make bad decisions when stressed. By their nature, incidents can be stressful, and it’s easy to get swept up in the excitement of solving the problem and forget all your preparation, experience and even basic things like caring for yourself.

The good news is that calm, just like panic, is contagious. And if you can stay calm while everything is on fire, it’s likely you’ll help your team stay calm too.

Here are some tips to care for your team and ensure they handle the stress positively.

Take breaks #

Possibly the most effective “hack” to improve incident response is to take breaks as a team.

Depending on the stage of your incident, a break can look very different. As a general rule, you want to take a break:

When you get paged #

Your phone has just gone off, probably with an anxiety inducing ringtone, and you know something bad has happened. Perhaps it’s in the middle of the night and you’ve just been woken up, or you were midway cooking dinner.

Either way, your heart rate has increased, you’re breathing faster and your palms have gone a bit sweaty. Your body thinks it’s in danger, and if you race into response mode, you’ll only confirm that and increase your level of stress.

So stop. Even the most demanding of pagers won’t notice a 30s delay in your response, and most are intended to handle a lot more.

After you’ve acknowledged the page (so your phone stops beeping), take a deep breath and steady yourself. You’ll be amazed at how effectively this can calm you down, and allow you to get in the right headspace to tackle the response properly, after telling your body you’re in control.

When impact has ceased #

Most incidents have a moment where the immediate impact has been resolved. Maybe the system is now back online, or you’ve prevented the damage from worsening and the situation is now stable.

When this moment comes, you should call a break for the entire team. You’ve probably been at 100% for however long it’s taken to get to this moment, and you’ll do yourself a disservice to continue without taking a moment to reflect and gather your thoughts.

It’s important, when this time comes, to mandate a break. If you’re the incident lead you should be telling people to step away from their screens and take a moment, and encourage people to stop working so the team can come back ready for the next stage of response.

Before follow-up work #

In complex incidents, there is often a long tail of work required to bring things back from stable to healthy again.

That long-tail is with responders who might have been working a long while, and have almost certainly been working at capacity for a short period.

Once you have a plan for recovery, before you start, take a break.

You have no idea if your plan will go well, or if the recovery process might cause the original issue to reoccur, or even worsen.

Take the opportunity and use the relative safety to reset your team, and have everyone return ready to face the worse, if it should happen.

Food matters #

It might sound simple, but one of the most common incident response mistakes is for a team to forget to eat.

Imagine a response team who’ve just brought a service back to stability. They’re buzzed from the adrenaline of a major outage, and while tired they feel great about bringing things back online, and allowing customers to use the service again.

It’s been an hour since the incident started, and instead of pausing at this point, the team – energised by their success – jump straight to the next step of response, recovering the service back to full health.

They take another hour to make a plan, then half an hour to practice, and now they think they’re ready. Someone says they’re getting a bit hungry but it’s not urgent, this next part should be simple…

Except it’s not. About half an hour in, the recovery plan goes wrong and the system has gone back offline. This time it’s even worse than before, having taken down another crucial service along the way.

Stepping back for a moment, it’s now 8pm, this team have been in the office since 9am this morning and people were getting hungry hours ago. They would swap responders for more rested colleagues except the entire team were so wrapped up in the original incident that they’ve all forgotten to eat, and they’re now tired, hungry, and facing up to another high-pressure incident demanding their full attention.

Don’t let your next incident go bad because you forgot to order dinner.

Hold others to account #

Taking breaks and ensuring people eat are just specific examples of ensuring your response team look after themselves. The issue is that people often forget this, because they’re focused so intensely on responding to the incident at hand.

Ensure your incident leads remind people to take breaks in order to avoid burnout. This is a core responsibility of the role, and includes themselves in that equation: everyone should be well rested and well fed to do their best work.