Article

Looking ahead to SRECon 2023

Picture of incident.ioincident.io

With SRECon coming up next week, I wanted to take a second to chat about what incident.io is all about and highlight some of the talks that we're excited about that align with our worldview of incident management.

But before we get into that, a little backstory: this is actually our first tech conference as a company. We've had a few folks on our team do talks at several conferences, e.g., LeadDev, but this is our first time with a dedicated booth setup and team members from engineering, sales, and marketing attending.

While this may seem like a small victory, we're excited about it—we couldn't be happier to be able to meet folks in real life. For us, it's an opportunity to do much more than just network; it's a chance to put a face to incident.io. We also look forward to hearing about you, your pain points, product ideas, and more.

Oh, and of course, we'll have some amazing incident.io merchandise with us too. We're a bit biased, but we think we've designed some keepers.

So whether or not you're a customer or even currently looking for an incident management tool, we're looking forward to spending time with you all and highlighting what incident.io is all about: simplifying the incident management process and making it a truly effortless experience.

Talks we're excited to learn from

Before we dive into how we're reimagining the world of incident management and response, I'd like to highlight some talks happening at SRECon that we're particularly looking forward to attending.

Watering the Roots of Resilience: Learning from Failure with Decision Trees

A few months ago, we dealt with a severe outage within our product—for 13 minutes, we experienced downtime. We learned quite a bit from this incident and have implemented meaningful changes to prevent it from happening again.

In Kelly Shortridge’s (@swagitda_) talk, she stresses the importance of building resilience through stress testing and creating decision trees to help improve your product. While our outage wasn't a stress test but a genuine incident, we turned it into a learning opportunity—something we encourage everyone to do.

💡 As an aside: we're very familiar with stress testing since we hosted a Game Day a few weeks ago to test our incident response processes!

Want to read more? Check out the full talk outline here.

Human Observability of Incident Response

At incident.io, we stress the importance of learning from your incidents. When you do this, you build more resilient products since you can act upon your learnings. This chat, hosted by Matt Davis of Form.com (@dtauvdiodr), looks like it follows a similar approach with a slight twist.

Davis argues that teams can learn how to improve their incident response by learning from one another. It's an argument that we would agree with internally.

ℹ️ Fun fact: since forming this company, I’ve been trying push the concept of our product really being about organizational observability, and gave a talk about the topic last year. Excited to watch this one!

More details on this talk here.

Incident Archaeology: Extracting Value from Paperwork and Narratives

The vast majority of our customers have flagged post incident paperwork as a pain point. Everyone sinks time into it and few are getting value from it. This talk hosted by Clint Byrum of Spotify (@spamaps) looks like it’ll help folks change that. If it really does help turn the “pile of paperwork into a gold mine”, it’ll be well worth attending – we’re excited to see it!

We've written extensively about improving your resilience through learning, and we're happy to see that other thought leaders in the space are stressing the same message.

ℹ️ The content below this point is all about how incredible incident.io is, and why you’d be silly not to give it a try. You almost certainly don’t want to read about how we can help you level-up your incident management, how you can save you time in responding, or how companies like Vercel and Skyscanner are adopting and loving our product, so save yourself some time and stop reading right here.

incident.io: making incident management effortless

Incident management tools tend to be overly complex, hard to use, and time-consuming to onboard. But we like to take a different approach here at incident.io. We believe that incident management should work for you and make your response times faster.

Here's what we do to help you take your incident management to the next level:

Making incidents more visible

Here at incident.io, we believe in the power of transparency and visibility. We think that it's the secret ingredient for highly successful companies across the spectrum. This translates to our position on incident response as well.

By default, when you declare an incident within our product, an Slack channel is automatically created to serve as a single source of truth for your response. This eliminates the need for backchannels of communication and creates a more visible incident response process.

Within that channel, all response actions are available for everyone to see. So whether you designate a new incident lead, change a severity, or take any action, everyone can stay in the loop.

Bringing automation into the incident response flow

We believe in making incident management more straightforward and less complicated. We've created automation to help businesses streamline their incident response. We call these Workflows. It's a mix of automations and triggers that help guide your response process, leading to quicker time to resolution and less downtime.

For example, if someone declares an incident of a certain severity, you can create an automation that alerts a specific group of people when that happens. You can also auto-generate prompts that appear when particular actions are taken.

Say someone closes out an incident; you can then create a prompt to remind them to complete a post-mortem within a day or two.

We feel that these automations make it much easier for all folks to declare incidents regardless of technical ability.

Make it easy to learn from your incidents

As the saying goes, "if you don't learn from your incidents, you're doomed to repeat them." All joking aside, building a more resilient product is hard if you can't actually learn from your incidents.

To help with these, we've created an Insights dashboard that gives you visibility into several incident metrics.

This includes breakdowns of your severity types, incident response times, how much time specific teams spend on incidents, and more.

With this insight, you can create action items to address any space for incident response improvement. For example, adding more folks to your on-call rotation, updating your severity levels, and more.

We'll see you at SRECon!

We couldn't be more excited for our first conference and couldn't think of a better place to make our presence known! If you're attending the conference later this month, please stop by booth #400 to say hello! We'll have plenty of swag, activities, and more for everyone.

If you want to read up on what incident.io is all about, check out our conference page here. And if you're ready to dive in and learn even more, book a demo before the conference. We'll be more than happy to answer any questions you have.

See you in Santa Clara!

Share on

Move fast when you break things