For a long time, AI had been one of the technology domains that everyone was curious about, but not necessarily building with directly. Machine learning was nascent, and similarly, LLMs existed but weren’t yet mainstream. Products like ChatGPT, Claude, and Gemini were mostly still science fiction to the world as a whole.
Then, OpenAI released GPT-3.5 and ChatGPT, and everything shifted.
Almost overnight, it became the topic of every conversation. People on our team were arriving with ideas, experiments, and ultimately, the same question: What could this mean for us?
Soon after, Anthropic released Claude 3.5, then Claude 3.7 and Claude Code. Immediately, not only what we could build but also how we could build saw an incredible step change internally — the game had changed.
Back then, we were more focused on helping teams coordinate incident response: building tools that help companies communicate clearly and make decisions quickly. But as the frontier models got better and better, so did our experiments and ambitions.
It quickly became apparent that this wasn’t just a novelty — it was a chance to fundamentally change what we could build for customers, and how we built it.
Our initial experiments came together pretty quickly: small prototypes that summarized incidents, suggested actions, or drafted postmortems and executive comms. These first results showing “what could be” were exciting, moments where we thought, This could be something special.
Then, reality hit.
Like most teams that have worked seriously with AI, we soon learned that the challenge wasn’t actually creating workable prototypes, small quality-of-life features, or great demos; it’s making it all reliable.
Getting a model to perform consistently across every situation, every customer, every dataset, every phrasing of a question? It's incredibly hard. AI is naturally non-deterministic and unpredictable. When it works, it really works, but when it doesn’t? Immediate loss of trust, outpouring of skepticism, and ultimately, frustration.
This challenge is even tougher when your product exists in the world of reliability.
Our customers depend on incident.io during some of their most critical moments, so “mostly right” isn’t good enough. Every answer, every suggestion, every piece of automation has to be consistently accurate, because people will make real-world decisions based on what our product shares and recommends.
For our most complex features, improvements in one area of the system would often be met by regressions in another. Measuring correctness and progress towards it was difficult. The more we tried to scale our efforts to new use cases, customers, and scale, the more obvious it became that we needed to rethink how we even engineered this kind of product.
So, we got to work on the hard yards and built out our foundations: evals, scoring frameworks, backtests, and training sets. Alongside developing our AI features, we created an internal test and experimentation platform that lets us measure, compare models on release, and iterate at speed to build new features and fix issues quickly.
That tooling has since become essential to how we ship, giving us visibility into performance, reliability, and accuracy across every change we make. It’s been a major investment, but it’s what makes our AI product genuinely dependable.
For a deep dive into those systems, Lawrence shared an excellent overview in his talk at SEV0 which I highly recommend watching.
As our familiarity with the new AI models grew, they began changing how we operated internally too.
Today, AI is woven into almost every part of how we build incident.io. It’s used in design, engineering, customer success, marketing and even in how we communicate internally.
Our codebase, documentation and workflows are all designed with AI tooling and collaboration in mind, which means we can move faster, experiment more freely, and spend more time solving real problems and less time teaching LLMs to behave the way we want them to.
Culturally, AI has rapidly become part of our DNA. There’s an AI-chat channel in Slack that’s constantly active, with people sharing prompts, debugging model behavior, or celebrating small breakthroughs.
Experimentation is actively encouraged and while we have sensible controls around things like security and resilience, other typical constraints on teams like budgets have been lifted entirely. We know the best ideas come from giving people room to try things that might not work yet and we don’t want cost to be the reason we miss out.
That openness has created a kind of shared momentum. Everyone’s learning together, feeding discoveries back into how we work across the company. It’s been incredible to watch that collective curiosity turn into genuinely new capabilities.
Being an AI-native company isn’t about adding AI everywhere “just because we can”. It’s about knowing where it makes a meaningful difference — something we regularly hear positive feedback on from customers using our platform.
For us, that means every new product idea starts with the same question: How can AI make this better for our customers?
You can already see this across the platform today — from automatically generated post-mortems and incident summaries to the ability to ask questions directly in the dashboard and get instant answers about your incidents. We’re continuing to find new ways to build AI into the fabric of everyday reliability work.
This approach has led to some of the most exciting work we’ve ever done, including our new AI SRE product, which we demoed at SEV0 in San Francisco and London. It shows how AI can support reliability engineering by automating some of the hardest and most time-critical parts of incident response.
AI’s role here isn’t just to make responders faster — it’s to help change the nature of their work and how they respond to incidents. By letting machines handle the bulk of repetitive, time-consuming work, we can substantially reduce downtime, cut noise, and give engineers more space to focus on complex, high-leverage problems.
These are the kinds of applications that feel transformative, not just additive and we believe they’ll fundamentally change what teams are capable of. We’re seeing it in the tools we use everyday and we want to do the same for others, through the products we build.
The past year has been about more than just adding AI features to our product. It’s been about reshaping our company around a new way of building that’s faster, more adaptive, and more open to experimentation.
We’ve gone from exploring what AI could do to making it part of how incident.io works, every day. And while we’re proud of how far we’ve come, it still feels like we’re only at the beginning.
AI has the potential to completely redefine how teams manage incidents, to make them calmer, faster, and smarter. We’re just starting to see what that looks like, and we couldn’t be more excited about what’s ahead.
🎥 Watch the full story here: Weaving AI into the fabric of incident.io

AI is rapidly transforming incident response, automating manual tasks, and helping engineers tackle incidents faster and more effectively. We're building the future of incident management, starting with tools like Scribe for real-time summaries and Investigations to pinpoint root causes instantly. Here's a deep dive into our vision.
Ed Dean
Incidents happen when the normal playbook fails—so why would we let AI run them solo? Inspired by Bainbridge’s Ironies of automation, this post unpacks how AI can go wrong in high-stakes situations, and shares the principles guiding our approach to building tools that make humans sharper, not sidelined.
Chris Evans
Why blindly trusting AI to optimize your prompts can backfire, and human intuition is still essential when building intelligent agents.
Milly Leadley
Join the ambitious team helping companies move fast when they break things.
See open roles