On this weeks' episode of The Debrief, we chatted with Jeff Forde, an Architect on the Platform Engineering team at Collectors.
With a background spanning finance, healthcare, and various product-led startups, Forde has honed his expertise in DevOps, site reliability, and platform engineering. Beyond his professional life, he's also a dedicated volunteer first responder and certified fire instructor in Connecticut, offering him a unique perspective on managing incidents of all typesz.
In our conversation, we discussed the many ways organizations can up-level their incident management programs, whether they're starting from zero, are an established org, or somewhere in between.
We encourage you to listen to the full episode, but if you're looking for some quick takeaways, we've shared some below:
The why behind incident management
One of the key takeaways from Jeff’s experience is the importance of understanding why an organization implements an incident management program.
Whether it's for compliance, reliability, customer experience, or even life safety in some cases, aligning everyone on the team with these goals is crucial. This alignment ensures that efforts are focused on the aspects of incident management that truly matter, improving outcomes and effectiveness.
Common missteps in incident management
Forde identified two common missteps he's seen when organizations are trying to create an incident management program:
- Viewing incident management as a productivity drain: Many organizations mistakenly see incident management as a hindrance rather than an opportunity for learning and innovation. By focusing on outcomes and learning from incidents, companies can transform incident management into a tool for improvement.
- Trying to implement an end-to-end program all at once: Starting with a heavy-handed approach can be overwhelming. Instead, Forde suggests beginning with a more manageable, lightweight process and evolving from there, based on the company's size, industry, and specific goals.
Balancing act: innovation and incident management in startups
Startups, with their focus on rapid iteration and market fit, might not initially be willing to invest heavily in incident management.
However, establishing a learning culture from incidents, even through simple post-incident reviews, can lay the groundwork for a blameless culture and continuous improvement.
Sharing findings with customers can also enhance trust and long-term relationships.
Maintaining a blameless culture
The concept of a blameless culture is crucial, Forde stresses. It's not about who took an action but why the action was taken. Understanding and improving processes to prevent future issues is more important than assigning blame.
This mindset fosters a learning environment where continuous improvement is possible.
Evolving incident management
Forde advises against trying to replicate the incident management processes of larger organizations without considering your own's unique context and needs.
Instead, focusing on the specific outcomes you want to achieve from incident management can guide you in implementing the most relevant aspects of the process for your organization.
Continuous feedback and improvement
Regular retrospectives and feedback sessions with responders help identify what's working and what's not, ensuring that the incident management process remains aligned with the team's needs and goals.
Forde recommends at least quarterly check-ins, even for small teams, to prevent burnout and ensure the process evolves as the organization grows.
Incident management is a living process
Jeff's insights underline the importance of a tailored, evolving approach to incident management.
By starting from a clear understanding of why incident management is necessary, avoiding common pitfalls, and fostering a culture of continuous learning and improvement, organizations can turn incident management into a powerful tool for innovation and growth.