The latest news from incident.io HQ

We’re building the best way for your whole organization to respond, review and learn from incidents. This is where we talk about how and why.

Article

What is an SRE? Understanding the responsibilities of this crucial function

Site reliability engineers are responsible for quite a bit, but one thing is clear—their role is critical. In this article, we break down everything you need to know about SREs and what they focus on.

incident.ioPicture of incident.io

incident.io

13 min read
Engineering

How we achieved pixel-perfect polish during our Status Pages launch

When we launched Status Pages, we wanted to challenge industry norms and push our design polish to new levels. As an engineering team, here's how we worked with our design team to make this happen.

Dimitra ZuccarelliPicture of Dimitra Zuccarelli

Dimitra Zuccarelli

10 min read
Article

Barcelona 2023 Company Offsite Recap

Last month, the team gathered for our second company offsite in sunny, oceanside Barcelona. Here's how it went.

Luis GonzalezPicture of Luis Gonzalez

Luis Gonzalez

9 min read
Engineering

Better security for your app's secrets

What comes after your default, out-of-box application secret solution? How do you add security to Heroku's environment variables, or go beyond putting secrets directly into Kubernetes? We've used GCP Secret Manager to improve our app secret handling, and this post shows how you can do the same.

Lawrence JonesPicture of Lawrence Jones

Lawrence Jones

11 min read
Article

Effective incident escalations

In the ever-evolving digital landscape, every organization must confront its fair share of incidents. Regardless of the sector or size, one common thread weaves through them all: the need for effective incident management. A crucial part of this management is incident escalation.

Chris EvansPicture of Chris Evans

Chris Evans

9 min read
Article

Driving successful change: Understanding DORA's Change Failure Rate metric

By using DORA's change failure rate metric, organizations can highlight inefficiencies in deployment processes and prevent pesky incidents from repeating.

Luis GonzalezPicture of Luis Gonzalez

Luis Gonzalez

11 min read
Article

Service level indicators: 6 key metrics for effective incident management

In this article, I'll highlight six important SLI metrics that can help drive better incident management processes.

incident.ioPicture of incident.io

incident.io

10 min read
Article

SLA vs KPI: Breaking down the differences, and similarities, of these important metrics

In this article, we'll lay out the differences between SLA and KPI, and explain how they impact performance management.

Luis GonzalezPicture of Luis Gonzalez

Luis Gonzalez

11 min read
Article

Synchronizing mental models

When everyone has their own mental model, it can hinder our ability to respond to incidents. Catalog creates a shared operational map, enabling faster decision-making, automated workflows, and an overall streamlined response process.

Chris EvansPicture of Chris Evans

Chris Evans

6 min read

Stay in the loop: subscribe to our RSS feed.

Operational excellence starts here