The timeline to fully automated incident response

We speak to engineering teams every day, and everybody knows AI is the future. Some tell us they’re massively accelerated by Claude, or that they’re rebuilding their product, team and ways of working. Cursor and Lovable have announced they’re building the last piece of software. Should we give in to the vibes? Embrace exponentials, and forget that the code even exists?

The reality is that things will still go wrong. They always do, at least from time to time.

So, what happens when they do? It's hard to embrace exponentials when you’re stuck vibe debugging an incident at 2am, for the third night in a row. It's our mission to be there for you and your team in that moment, no matter who or how the code was written. We believe it’s critical that you have a tool that’s just as capable supporting you when things go wrong, as the tools you’re using to build your product in the first place.

This year we’re building an AI incident responder that will work with you every step of the way. incident.io will investigate the incident with you, show you what’s wrong and why it’s wrong by analysing code, incidents, logs, monitoring and more. Eventually, we’ll show you how to fix it or offer to fix it on your behalf. It will feel like we’re right there with you, helping you resolve the incident much, much faster just like your best and most experienced colleague would.

Okay, okay. This all seems a little far fetched. What’s happening, and when? We’re ready give you a sneak peak into how our product will look and feel, so you can understand how incident.io is transforming the future of incident management.

Today: AI agents that let you focus on investigating & fixing the incident

We started with Scribe and features like suggested summaries and follow-ups. These reduce the operational overhead of incident response from configuring alerts to finishing follow-ups so you can focus on actually fixing the issue. Scribe, for example, writes notes for you in real-time during incidents and shares them in your incident channel just like somebody tasked with taking notes on the call would, so no valuable context is lost.

We also already have @incident in private beta with customers like Ramp, so you can chat directly to incident.io in Slack or Teams. It’ll be generally available alongside our other new AI products later this year. It’s a capable agent that can do much of the manual toil of incident response on your behalf.

These features are loved by the people using them everyday. Here’s a little customer love for Scribe.

This year: AI agents that investigate incidents with you

Later this year we’ll launch Investigations, a new product that will help you investigate and ultimately resolve incidents significantly faster. We’ll be there with you from the moment you get paged, all the way until you’ve dotted the i’s and crossed the t’s on your postmortem.

Here’s how a future incident may look with Investigations:

02:00 AM:

Your phone buzzes with that dreaded on-call alert at 2 AM. Your heart sinks as you reluctantly reach for it, hoping it's just a false alarm. Instead of opening your laptop to fumble through a dozen tabs of 2 am 2FA, you open the incident.io app to find an investigation already underway.

By the time you acknowledge the page, incident.io has analyzed the situation, created an initial diagnosis, and is already processing thousands of past incidents, code changes, and metrics to give you a running start.

Moments later the investigation is complete, with an overview of what’s going on and a precise root cause.

02:04 AM:

The situation looks serious - the payment-batch-handler is experiencing OOM errors, and customer payments aren't processing. You splash some cold water on your face and get your laptop out to respond, where you find a root cause with memory usage graphs and the specific PR that caused the incident. Next steps are laid out too based on how your team has responded to similar incidents before. You ask a couple of targeted questions: "Have we made other changes to garbage collection recently?" and "Have we seen similar issues before?". Within seconds, incident.io provides concise, relevant answers.

02:07 AM:

You’re starting to wrap your head around what’s going on, but you want to dig into a bit more detail, so you jump into incident.io. One click takes you from Slack to the detailed incident page. Everything the investigation has uncovered is there - the problematic PR, metrics showing the exact pattern, deployment details, and relevant past incidents.

No more context-switching between a dozen different systems - all the evidence is easily accessible in a single click, and embedded natively in the page.

Looking through the details, you confirm the root cause as a memory leak in the batch processor from PR #4285. You can move on to reviewing the next steps suggested based on past incidents.

02:31 AM:

The suggested next steps looked sensible, so you restart the affected pods and roll back the deployment. Now you can pause the incident, head back to bed, and take on the follow-ups in working hours.

The next day:

When you return to work, incident.io has already drafted a comprehensive postmortem, complete with an accurate timeline, contributing factors, resolution, and follow-ups.

You review, make a few tweaks, and share it with the team - turning what would have been hours of work into minutes.

Next year: AI agents that resolve incidents for you

Soon incident.io Investigations will become even more powerful. We’ll be able to accurately find the root cause of an incident, and then suggest the right next steps to resolve the issue more consistently for your most complex incidents. We’ll even offer to do the next steps on your behalf so at the click of a button, incident.io will be able to resolve some incidents for you.

That doesn't mean AI will replace you. Our mission will always be to build products that work alongside teams to resolve incidents faster. Engineers will still need to manage and respond to incidents, likely at an even higher volume than before. The following trends give us confidence in the future of a human-in-the-loop model:

We know successful incident response affects your entire organization, requiring collaboration across support, legal, operations, engineering, infrastructure, and other teams. This cross-team collaboration is inherently human—and will remain so.
Ultimately, incidents are the outcome of all other controls failing. Even if 99% of issues can ‘self-heal’, there’s a long tail that this won’t apply for. With tools like Cursor and Lovable the absolute scale of software will grow, so the 1% that AI systems get stuck on will be highly significant.
There’s also the irony of automation: not responding to the 99% makes it more difficult to respond to the 1%. You’ll be going in with lower context, so you’ll need an AI system working with you.

The bottom line is that there’s going to be more incidents, that are trickier to solve, so people will need much better tools to support them. These products will handle enormous tasks on your behalf, from analyzing massive datasets to writing thousands of lines of code. This is the future, and the good news is we’re building it.