The latest news from incident.io HQ

We’re building the best way for your whole organization to respond, review and learn from incidents. This is where we talk about how and why.

Article

A guide to post-mortem meetings and how we run them at incident.io

Post-mortem meetings can play a crucial role in fostering an environment of continuous learning. Here's how we do them!

incident.ioPicture of incident.io

incident.io

8 min read
Article

Whose fault was it anyway? On blameless post-mortems

While blameless post-mortems are a great idea on the surface, if taken to the extreme, they can muddy how much you actually learn from incidents.

incident.ioPicture of incident.io

incident.io

7 min read
Engineering

Keeping the codebase consistent with Pattern Parties

As a codebase evolves, it’s common to see some divergence in the design patterns within it.

Kelsey MillsPicture of Kelsey Mills

Kelsey Mills

7 min read
Article

Better learning from incidents: A guide to incident post-mortem documents

Post-mortem documents are a great way to facilitate learning after incidents are resolved.

Luis GonzalezPicture of Luis Gonzalez

Luis Gonzalez

8 min read
Engineering

Clouds, caches and connection conundrums

During a recent infrastructure migration into Google Cloud, we kept running into a pesky issue without a clear cause. Here, we dive into the twists and turns we took to finally figure out what the smoking gun was.

Ben WheatleyPicture of Ben Wheatley

Ben Wheatley

13 min read
Article

How we’ve made Status Pages better over the last three months

A few months ago we announced Status Pages -- the most delightful way to keep customers up-to-date about ongoing incidents. Since then, we've launched several features to add an extra bit of delight. Read on to learn more.

incident.ioPicture of incident.io

incident.io

8 min read
Article

The balancing act of reliability and availability

To prevent issues like downtime, you have to focus on the reliability and availability of your product. But there's a balance to be struck here.

incident.ioPicture of incident.io

incident.io

8 min read
Article

Incident management vs problem management: understanding the connection between the two

While problem management and incident management may seem different, they're two sides of the same coin.

Luis GonzalezPicture of Luis Gonzalez

Luis Gonzalez

4 min read
Engineering

Practical guidance for getting started as a Site Reliability Engineer

Here are a few strategies that might help you build up context, find the problems that really matter and turn these into a plan of action.

Ben WheatleyPicture of Ben Wheatley

Ben Wheatley

7 min read

Stay in the loop: subscribe to our RSS feed.

Move fast when you break things