Working with distributed teams
It’s increasingly common that we’re not all sitting in the same room anymore, which makes managing incidents difficult. This chapter outlines best practices for making remote work work.
A team with remote members that doesn’t adopt a remote-first approach will exclude those who aren’t physically present. It’s not something we do deliberately, but it’s part of being human: if you get caught up in a conversation physically, you’re not going to notice the notifications pile up on your silenced phone.
Pre-agreed communication channels #
This brings us to the first key change you must make for remote response: ensuring key information about the incident response is broadcast via remote channels.
If you followed our guide on creating an incident space, then you’ll already have an incident channel, and perhaps a status page. You’ll want to use these to:
- Post all key updates, to align everyone on progress
- Keep track of all previous updates as an ‘on ramp’ for those who come later
It sounds simple, but there is a big challenge here: getting people to share the updates.
Without training (and ideally automation), responders are likely too busy to update all channels correctly all the time.
With a simple rule that all key discoveries come with an incident update, with those updates shared across a number of channels, you can make life much easier for anyone following along remote.
incident.io can help by providing a single path to publish incident updates with the /inc update
command. This can be configured to push the message to all the key locations, whether that’s Slack channels, email addresses, or to someones phone.
Over-communicate in the incident channel #
This doesn’t just apply to distributed teams but becomes more important when you are: make sure technical details, incident response resources and links are tracked as messages in your incident channel.
As we explain in Communicating within your organization, the incident channel can be your teams’ shared memory, and help you coordinate on what you’ve tried or done throughout the response if you ever need to check the record.
For those who are remote, unless you push your actions into the channel, it’s unlikely they’ll ever know you did them. Over-communicate to avoid getting in each others way.
When required, go high-definition #
You’re leaning heavily on the incident channel now which is great, but it’s not the answer to everything. Especially at the start of an incident when you’re getting your bearings, coordinating via instant messaging just isn’t going to cut it.
It’s time to go HD: open a Slack huddle or a video call and benefit from much more efficient communication.
That’s not all, though: if you just chat in Zoom, only you and whoever was in that call is going to know what happened.
You have two choices:
- Nominate a scribe in the Zoom chat to extract actions and assign roles as you go, ideally using your incident tooling to coordinate
- Try and remember what you discussed and recap this into your incident channel once you’re done in the call
The first option here is clearly preferable, as it prevents you forgetting parts of what you discussed.
If you’re using incident.io, assigning roles and managing incident actions can be done with minimal effort while you’re Slack Huddling!
Assign actions #
When you’re all in the same room, you know who is working together and vaguely what on. You might even be able to tell how well things are going, depending on the frequency of swear words coming from their corner!
But when you’re remote, you don’t have any of those cues. It means you run the risk of duplicating work, or even colliding with someone else working in a similar area.
Now more than ever, it’s time to track incident actions, and work hard to keep them up-to-date.
If you’re using incident.io, you’ll create actions via either the Slack channel or the web dashboard, and each action can be assigned to a person. There’s even useful tools like “request an owner” on an action, where we’ll message people who are unassigned to ask if they can pick this up.
Whatever tool you use, ensure it’s up-to-date and comprehensive. It will improve your response, and make it possible to switch people in and out of the incident without losing context.
Too many cooks #
While you miss the advantages of being right next to your fellow responders, embracing a distributed response means you can benefit from a variety of perspectives and skillsets, irrespective of location.
But now have a different problem: unlike the war room, your incident channel has no size limit, and can feasibly grow to fit your entire company.
Just as you should limit the physical responders to an incident, you should avoid too many people participating in the incident channel. If you don’t, you run the risk of distracting a large part of your organization when one of the goals of great incident response is to allow business as usual to continue, and your responders are likely to feel nervous and less able to perform.
You’ll want to solve this by:
- Directing people invested in the incident outcome to a place where they can consume updates without impacting the response
- Keep track of the people who are playing an active role in the response and those that are just observing.
- Lean on your incident roles to ensure all aspects of response are covered, and encourage people who aren’t strictly necessary to return to their normal jobs
Discipline #
The final note might be obvious, but is really important: especially when your team is distributed, you need to be more disciplined in your incident response process.
If you have incident roles, use them! Likewise, incident actions should be tracked and assigned so everyone knows who’s doing what.
Failure to follow your incident response process will all but guarantee confusion, with different people in the response team following different playbooks. That won’t lead to good outcomes, and can seriously impair your ability to fix the issue, especially when acting in a remote team.
You may have tried fighting for this at your own company without success: you’re not alone, we’ve all been there! Our advice is to think carefully about why people don’t want to follow the existing response process, rather than assume people have bad intentions.
More often than not, people diverging from the process means the process itself is broken, or too difficult to follow. You can fix this by adopting a tool like incident.io where you can encode your business’ process and greatly simplify the lives of responders when following it, helping them easily adhere to the practices you’ve agreed in advance.