We speak to engineering teams every day, and everybody knows AI is the future. Some tell us they’re massively accelerated by Claude, or that they’re rebuilding their product, team and ways of working. Cursor and Lovable have announced they’re building the last piece of software. Should we give in to the vibes? Embrace exponentials, and forget that the code even exists?
The reality is that things will still go wrong. They always do, at least from time to time.
So, what happens when they do? It's hard to embrace exponentials when you’re stuck vibe debugging an incident at 2am, for the third night in a row. It's our mission to be there for you and your team in that moment, no matter who or how the code was written. We believe it’s critical that you have a tool that’s just as capable supporting you when things go wrong, as the tools you’re using to build your product in the first place.
This year we’re building an AI incident responder that will work with you every step of the way. incident.io will investigate the incident with you, show you what’s wrong and why it’s wrong by analysing code, incidents, logs, monitoring and more. Eventually, we’ll show you how to fix it or offer to fix it on your behalf. It will feel like we’re right there with you, helping you resolve the incident much, much faster just like your best and most experienced colleague would.
Okay, okay. This all seems a little far fetched. What’s happening, and when? We’re ready give you a sneak peak into how our product will look and feel, so you can understand how incident.io is transforming the future of incident management.
We started with Scribe and features like suggested summaries and follow-ups. These reduce the operational overhead of incident response from configuring alerts to finishing follow-ups so you can focus on actually fixing the issue. Scribe, for example, writes notes for you in real-time during incidents and shares them in your incident channel just like somebody tasked with taking notes on the call would, so no valuable context is lost.
We also already have @incident in private beta with customers like Ramp, so you can chat directly to incident.io in Slack or Teams. It’ll be generally available alongside our other new AI products later this year. It’s a capable agent that can do much of the manual toil of incident response on your behalf.
These features are loved by the people using them everyday. Here’s a little customer love for Scribe.
Later this year we’ll launch Investigations, a new product that will help you investigate and ultimately resolve incidents significantly faster. We’ll be there with you from the moment you get paged, all the way until you’ve dotted the i’s and crossed the t’s on your postmortem.
Here’s how a future incident may look with Investigations:
02:00 AM:
Your phone buzzes with that dreaded on-call alert at 2 AM. Your heart sinks as you reluctantly reach for it, hoping it's just a false alarm. Instead of opening your laptop to fumble through a dozen tabs of 2 am 2FA, you open the incident.io app to find an investigation already underway.
By the time you acknowledge the page, incident.io has analyzed the situation, created an initial diagnosis, and is already processing thousands of past incidents, code changes, and metrics to give you a running start.
Moments later the investigation is complete, with an overview of what’s going on and a precise root cause.
02:04 AM:
The situation looks serious - the payment-batch-handler is experiencing OOM errors, and customer payments aren't processing. You splash some cold water on your face and get your laptop out to respond, where you find a root cause with memory usage graphs and the specific PR that caused the incident. Next steps are laid out too based on how your team has responded to similar incidents before. You ask a couple of targeted questions: "Have we made other changes to garbage collection recently?" and "Have we seen similar issues before?". Within seconds, incident.io provides concise, relevant answers.
02:07 AM:
You’re starting to wrap your head around what’s going on, but you want to dig into a bit more detail, so you jump into incident.io. One click takes you from Slack to the detailed incident page. Everything the investigation has uncovered is there - the problematic PR, metrics showing the exact pattern, deployment details, and relevant past incidents.
No more context-switching between a dozen different systems - all the evidence is easily accessible in a single click, and embedded natively in the page.
Looking through the details, you confirm the root cause as a memory leak in the batch processor from PR #4285. You can move on to reviewing the next steps suggested based on past incidents.
02:31 AM:
The suggested next steps looked sensible, so you restart the affected pods and roll back the deployment. Now you can pause the incident, head back to bed, and take on the follow-ups in working hours.
The next day:
When you return to work, incident.io has already drafted a comprehensive postmortem, complete with an accurate timeline, contributing factors, resolution, and follow-ups.
You review, make a few tweaks, and share it with the team - turning what would have been hours of work into minutes.
Soon incident.io Investigations will become even more powerful. We’ll be able to accurately find the root cause of an incident, and then suggest the right next steps to resolve the issue more consistently for your most complex incidents. We’ll even offer to do the next steps on your behalf so at the click of a button, incident.io will be able to resolve some incidents for you.
That doesn't mean AI will replace you. Our mission will always be to build products that work alongside teams to resolve incidents faster. Engineers will still need to manage and respond to incidents, likely at an even higher volume than before. The following trends give us confidence in the future of a human-in-the-loop model:
The bottom line is that there’s going to be more incidents, that are trickier to solve, so people will need much better tools to support them. These products will handle enormous tasks on your behalf, from analyzing massive datasets to writing thousands of lines of code. This is the future, and the good news is we’re building it.
We’re hiring for AI engineers, apply to join us! Or if you're interested in following along with our progress, we're building in public.
Incidents happen when the normal playbook fails—so why would we let AI run them solo? Inspired by Bainbridge’s Ironies of automation, this post unpacks how AI can go wrong in high-stakes situations, and shares the principles guiding our approach to building tools that make humans sharper, not sidelined.
Why blindly trusting AI to optimize your prompts can backfire, and human intuition is still essential when building intelligent agents.
Building with AI is one of the easiest ways to create a huge infrastructure bill. Teams need visibility and awareness of what they're spending, along with guardrails to catch mistakes. This is how we control spend at incident.io.
Join the ambitious team helping companies move fast when they break things.
See open roles