The final ingredient to a great on-call process is compassion. Being on-call isn’t like the responsibilities of most jobs: it unavoidably blurs the boundaries between work and home.
There are a number of things you can do to create a compassionate on-call culture.
Schedule with flexibility #
Scheduling doesn’t need to be top-down: each on-call team can decide what schedule works for them. The right answer will vary depending on lifestyles and pager load: if you get woken up every night you don’t want to be on-call for seven nights in a row!
If you’re running weekly or daily shifts, think about when is a good time to hand over. Optimise for the person finishing their shift: they are more likely to be stressed.
At incident.io, our engineering team runs on-call Friday-to-Friday, with a 5pm handover. This works great for us: as an engineer coming off the pager you get to finish work nice and early on a Friday and enjoy your evening.
Relieve the pressure with overrides #
Overrides are a core part of a good on-call process. Everyone should be able to ask for overrides to allow them to participate in their personal life; whether that’s going to the gym, date night, or school pickup. Taking the pager for an hour or two should be really common, and the whole team should feel comfortable requesting it, no questions asked.
Proactive overrides are also important: if you see that someone’s been paged overnight - why not take the pager from them for the morning and make sure they can have a lie in. Of course you should always ask before giving the pager to someone, but within a supportive team environment there’s no need to ask when you take it away — but do let them know so they can chill out!
Don't expect to be on-call and fully available #
Being on-call can affect your day-to-day work if the pager is noisy or the problems are complicated. It’s important to consider this when planning work for the team: there’s nothing worse than feeling like your on-call responsibilities have put you on the back foot for the team’s primary goals.
Keep on-call reactive #
You should not have to repeatedly refresh a dashboard every hour to see if everything’s OK. If you’re on-call, you should be able to rely on your monitoring systems and operational teams to alert you when you’re needed. Investing in this will both improve your on-call experience, and also mean that you find out quickly if things do go south.
Look out for heroes #
It’s common for a single team member (or small group) to take on the majority of the on-call burden. Whether that’s because they always take the overrides, or jump in whenever they see something - they’re trying to do the right thing.
Letting individuals take this burden is convenient, but long-term it’s not good for team health. Everyone will start expecting this person to take everything, and they’ll feel under-appreciated. They may also burn themselves out pushing too hard, or fail to make progress in other areas as they’re focussing too much on reactive work.
Make sure that there are enough people on your on-call rota, and that when people leave the rota they are quickly replaced. Succession planning matters too: if you know someone is planning to leave or change roles, consider how that impacts your on-call rota.
For further insights, see our blog post: No capes: the perils of being a hero-engineer