Register now: Why you’re (probably) doing service catalogs wrong
Register now: Why you’re (probably) doing service catalogs wrong
Since its launch in 2009, PagerDuty has been the go-to tool for organizations looking for a reliable paging and on-call management system. It’s been the operational backbone for anyone running an ‘always-on’ service, and it’s done the job well. Ask anyone about the product, and you’re all-but-guaranteed to hear the phrase “it’s incredibly reliable.” I agree.
But reliability isn’t everything. Many users feel PagerDuty has neglected innovation in its core on-call product, focusing instead on expanding into areas like AIOps and “the Operations Cloud”—whatever that is. Meanwhile, as teams increasingly manage incidents within communication tools like Slack and Microsoft Teams, PagerDuty has struggled to keep pace with the evolving market.
But innovation aside, many teams face a more fundamental challenge with PagerDuty: its core experience and data model for routing alerts—from the systems that generate them to the right person’s phone.
In this post, we’ll explore the core experience of setting up alerting systems and ensuring the right people get paged. We’ll compare PagerDuty’s data and routing model to how we’ve reimagined it with incident.io On-call, providing a markedly improved way to set up and operate on-call.
Finally, we’ll walk through two ways companies are successfully migrating from PagerDuty to incident.io. Let’s dive in!
If you’re reading this, you’re likely familiar with PagerDuty’s data model for routing alerts to on-callers. In the interest of getting everyone on the same page though, here’s a quick primer:
For visual learners, here’s how it looks:
There’s a lot to like about PagerDuty’s approach, particularly its simplicity when starting out—create a Service, connect monitoring systems, and forward alerts to an escalation policy. However, this simplicity comes with significant challenges:
In the interest of transparency, PagerDuty does offer more routing flexibility through Event Orchestration. However, it’s complex to configure and only available on higher-cost pricing tiers, like AIOps or Advanced Event Orchestration.
When designing incident.io On-call, we focused on solving the common pain points of the PagerDuty model while keeping the elements that work well. Our goal was to offer a system that could replicate a PagerDuty configuration without requiring a complete overhaul, while also enabling teams to simplify and improve their setup over time.
Here's how our data model works:
The eagle-eyed among you will notice the introduction of “Catalog” in the diagram above.
PagerDuty’s configuration is largely static and changes to teams or ownership require updates across multiple systems. With incident.io's Catalog, routing logic is centralized and dynamically linked to organizational data.
Best explained through example, let’s imagine our alert route logic is configured as follows:
if the alert has a team:
escalate to the team's linked escalation path
else if the alert has a service:
find the service's owning team and escalate to their escalation path
otherwise:
escalate to a default escalation path
And our (simplified!) Catalog looks like this:
Name | Escalation Path |
---|---|
Platform Team | Platform |
Security Team | Security |
Name | Owner |
---|---|
Deploy Service | Platform Team |
Auth Service | Security Team |
Now, if an alert arrives that looks like this:
{
"name": "AuthServiceOffline",
"team": "security_team"
}
We’d unpack it, find Security Team
and return the Security
escalation path to the Alert Route so the escalation arrived with the security team on-caller.
And if we received an alert like this:
{
"name": "DeployHighErrorRate",
"service": "deploy_service"
}
We’d unpack it, find Deploy Service
then navigate to the owning team Platform Team
and return the Platform
escalation path to the Alert Route so the escalation arrived with the platform team on-caller.
Now if we need to make a change to add a new Service Payments
whilst also changing the ownership of the Deploy Service
to Security Team
, we just update Catalog entries, leaving our routing logic entirely untouched. It’s all data.
Obviously, this is a simplified example, but in an organization where there’s hundreds of teams and thousands of services, and Catalog is synchronized with your source of truth (like Backstage, ServiceNow CMDB or your home-built solution), it’s a huge time saving and peace of mind improvement to know on-call escalations aren’t a thing you need to concern yourself with.
At first glance, the data models of PagerDuty and incident.io might seem similar. This is intentional—our goal is to provide a seamless migration experience. However, the subtle differences in our approach lead to significant improvements in managing on-call operations.
This approach not only reduces operational complexity but also makes scaling and adapting to organizational changes much easier.
We’ve helped hundreds of companies migrate, and while the specifics vary, most follow one of two paths:
Why choose this approach?
If you’re happy with your current alert routing and want to leverage incident.io’s advanced features without major changes, this is the best option.
Steps to migrate:
In a straight replication, the process is very straightforward, and looks broadly like this:
It’s a little time consuming, granted, but with no changes to the routing model, it becomes very straightforward task. This is the before and after of this migration:
Why choose this approach?
If you're looking to simplify and optimize your on-call setup, this approach centralizes routing logic and simplifies your configuration.
Steps to migrate:
This approach revolves around populating data in Catalog to map alerts to teams, and centralizing all routing logic within an Alert Route. The specifics will vary depending on how alerts route to teams, but in broad strokes it can be broken down as below:
if the alert has a team:
escalate to the team's escalation path
else if the alert has a service:
find the service's owner team and escalate to their escalation path
else:
escalate to a default escalation path
Overall, this approach takes more effort to set up, but vastly simplifies ongoing management, making it easier to adapt to organizational changes.
From our own experience and the feedback we've had from our customers, we're confident incident.io On-call offers a significant step forward for managing on-call alert routing. Whether you choose to replicate your current setup or optimize it, we've got the tools and flexibility to support your needs.
And, honestly, this post skipped over all of the really good stuff! Things like cover requests, smart escalation paths, multi-rota schedules that support things like shadow on-call natively, and a mobile app that feels like it was built in this century.
I'm one of the co-founders, and the Chief Product Officer here at incident.io.
We created a dedicated page for Anthropic to showcase our incident management platform, complete with a custom game called PagerTron, which we built using Claude Code. This project showcases how AI tools like Claude are revolutionizing marketing by enabling teams to focus on creative ways to reach potential customers.
We examine both companies' comparison pages and find some significant discrepancies between PagerDuty's claims and reality. Learn how our different origins shape our approaches to incident management.
The EU AI Act introduces new incident reporting rules for high-risk AI systems. This post breaks down what Article 73 actually mandates, why it's not as scary as it sounds, and how good incident management makes compliance a breeze.
Ready for modern incident management? Book a call with one our of our experts today.