Just two months ago, we announced our $62M Series B funding and shared our vision for the future of incident management: one where AI agents work alongside engineers to investigate, diagnose, and resolve incidents faster than ever before.
Today, that vision becomes reality with the unveiling of AI SRE.
Anthropic's CEO predicts AI will write 90% or more of code soon. Andrej Karpathy's "vibe coding" went viral because it captured what every engineering team feels: tools like Cursor and Claude are slashing the time from idea to production, with AI churning out features in minutes.
But here's what's really happening in your organization:
Software has always been a tangle of intricate systems. AI cranks that complexity to 11.
We're thrilled to introduce an always-on AI SRE, which spots issues, surfaces root causes, and takes action to help resolve incidents. It connects telemetry, code changes, and past incidents to fix issues faster.
Let's take a look at a real incident walkthrough. Say your payment service goes down at 2AM. Here's what happens:
2:00 AM - Alert fires. AI SRE immediately begins investigation.
2:01 AM - While your on-call engineer is still waking up, AI SRE has already:
2:03 AM - Your engineer opens Slack to find:
2:05 AM - Engineer: "@incident create a fix for this please?"
AI SRE creates a plan to clear the batch cache and set a max cache size to keep memory under control. Within seconds, it opens a PR with the complete fix.
2:15 AM - Service restored. Post-mortem already drafted. Engineer goes back to bed.
The AI SRE handles the incidents you shouldn't have to wake up for. It triages every alert, resolves what it can autonomously, and only escalates when human judgment is truly needed. When you do get paged, you'll find the investigation already complete and solutions ready to review.
🔍 Investigate the issue: Triage and investigate your alerts, analyze the root cause, then recommend whether you should act now or can defer until later.
🎯 Find the root cause: Connect the dots between code changes, alerts, and past incidents to quickly uncover what went wrong and why.
💬 Ask it anything: Humans can chat directly with AI SRE to investigate deeper together. Ask "Have we seen similar issues before?" and within seconds, it will provide concise, relevant answers.
🚀 Resolve incidents for you: From spotting the failing PR to suggesting the fix, AI SRE investigates issues, surfaces next steps, and helps bring your systems back to health, even while you're sleeping.
📝 Draft your post-mortems in seconds: Instantly draft a post-mortem, complete with an accurate timeline, contributing factors, resolution, and track the follow-ups for you and your team.
All of this happens without leaving Slack & Microsoft Teams. AI SRE catches relevant context from across channels, searches dashboards and logs from Grafana or Datadog, and can even fix bugs directly by generating pull requests. No tab switching, no context switching.
We know what you're thinking: "Great, now AI is going to wake up my team with hallucinated root causes!"
That's why we built AI SRE to be radically transparent. It surfaces evidence, not guesses. It shows its work, citing specific PRs, past incidents, and data sources. Every conclusion is traceable, every recommendation is backed by data. Your engineers always make the final call, they just make it faster and with complete visibility into the AI's reasoning.
Incidents shouldn't stall the whole team. Let AI scan thousands of resources, from pull requests to dashboards, to find what's broken, share the context, and recommend next steps—so fewer people get pulled in, and resolution comes faster.
AI SRE is live in production internally at incident.io and with a handful of our customers today. So far we've found that it:
Here's what our customers are saying:
While others bolt AI onto legacy tools, we've spent four years systematically reinventing incident management from the ground up.
Back at Monzo, we lived through the chaos firsthand: clunky tools, inconsistent processes, and barely a chance to learn from what went wrong. So we started with Response, the incident management platform we wished we'd always had, meeting engineers where they work inside Slack. Then came Status Pages, because keeping customers informed shouldn't add to the chaos. Then came On-call, a modern alternative to outdated paging tools that respects your engineers' time.
Now we're completing the vision we set out to build four years ago: an AI SRE that resolves incidents just like your best engineer.
All working together in a modern incident management platform, designed for how teams actually work, with AI at the core.
We've always believed that incident management shouldn't be about scrambling through dashboards at 3 AM. It should be about having the right information, at the right time, with clear next steps.
AI SRE delivers on that promise. It's your expert teammate who never sleeps, never panics, and gets smarter with every incident.
Ready to experience the only AI-native incident management platform? Get a demo to see the complete incident.io platform in action, or if you're already a customer, ask your account team about getting access.
Welcome to the future of incident management.
Ready for modern incident management? Book a call with one our of our experts today.