
Like many SaaS businesses, we have an on-call rota to enable us to provide 24x7 cover if there are problems with incident.io. We have a 'pager' which will alert the relevant person if something unexpected happens in our app, so that they can investigate and fix it if needed.
Note: This was adapted from an internal document we wrote about how we think about on-call at incident.io.
We're building a product that people depend on 24x7, all year around. It's important it always works, that means we need to support it around the clock. During office hours this is a shared responsibility across the whole team, but to limit the impact out of hours, we have a dedicated person 'holding the pager'.
Being on-call doesn't come without its benefits. By taking on the operational responsibility for the work we do, we tighten the feedback loops between the shipping and running. This helps us to make pragmatic engineering decisions and provide a healthy tension between shipping new code, and supporting and improving what we have.
Additionally, our product is designed, partly, to support folks who are on-call. There's no better way for us to empathise with our customers and find the opportunities and rough edges than to do the job ourselves.
As an incentive, and to compensate for the inconvenience of having to remain close to your laptop, we'll pay a fixed amount per week to anyone who's on-call.
We'll calculate pay automatically from our on-call schedules, and take overrides into account too. We'll calculate pay down to the minute, so if you cover someone for an hour while they go to the shops, you'll be paid for that time.
By compensating on-call we also aim to make overrides feel more fair, and avoid the need for more complex swaps of time. If someone offers to cover a day of your shift, they'll be paid for it so there's no need to feel indebted.
On-call payment is not expected to cover any time you spend working outside of hours. If you're paged and end up working in your evening, you should take time off in lieu. We trust you to manage this time yourself.
Being on-call unavoidably has an impact on your home life, but we want to provide the best possible experience. Here's a few ways we'll collectively help each other:

I'm one of the co-founders, and the Chief Product Officer here at incident.io.

Post-mortems are one of the most consistently underperforming rituals in software engineering. Most teams do them. Most teams know theirs aren't working. And most teams reach for the same diagnosis: the templates are too long, nobody has time, nobody reads them anyway.
incident.io
This is the story of how incident.io keeps its technology stack intentionally boring, scaling to thousands of customers with a lean platform team by relying on managed GCP services and a small set of well-chosen tools.
Matthew Barrington 
Blog about combining incident.io's incident context with Apono's dynamic provisioning, the new integration ensures secure, just-in-time access for on-call engineers, thereby speeding up incident response and enhancing security.
Brian HansonReady for modern incident management? Book a call with one of our experts today.
