From on-call to insights, and everything in between, this guide is the ultimate resource for modern organizations managing incidents
Chapter I
On-call is a must-have for modern teams, ensuring the right people are ready to respond when things go wrong. In this chapter, we dive into what on-call truly means, who should be involved, and how to make on-call a human-friendly experience.
Chapter II
In this chapter we'll define what an incident is, how to understand its impact, and establish a common language you can use across your organization.
Chapter III
The master class on responding when things go wrong! We’ll cover everything from declaring incidents and assembling the right team to communicating with your organization and customers.
Chapter IV
Incidents are a powerful way to learn about your organization and systems. In this chapter, we’ll explore how to turn incidents into learning opportunities, share expertise, and capture actionable steps to reduce recurrence and minimize future impact.
Chapter V
While each incident provides individual lessons, analyzing them collectively can uncover patterns, track operational load, and enable deeper thematic insights. This chapter explores metrics beyond traditional measures like MTTR, using rich data to help improve incident response and support team health
Further reading
Our philosophy on operational excellence.