Why PagerDuty wasn't built for the rate at which engineering teams now ship code

May 29, 2026 — 25 min read
TL;DR: AI coding assistants fundamentally changed how fast software ships, and more deployments mean more deployment-triggered incidents, more alert noise, and higher on-call load. PagerDuty's business signals tell the story clearly: dollar-based net retention fell to 98% as of January 31, 2026, down from 106% as of January 31, 2025, meaning existing customers actively cut their spend. Meanwhile, we shipped over 200 fixes and features in Q1 2025 alone, with an AI SRE that delivers up to 80% reduction in MTTR. If your pager was built for human-speed development, it is now a bottleneck.

More code ships faster than ever. AI coding assistants like GitHub Copilot and Cursor accelerate pull request output dramatically, and every additional deployment creates a new opportunity for something to break in production. If your on-call tooling was designed for a world where teams ship once or twice a week, it is not equipped for the world you are operating in now.

PagerDuty built its platform for that older world. Its architecture is web-first. Its alerting model is strong, but its coordination layer, the part that matters most when your team scrambles at 2 AM, forces engineers to leave Slack, open a browser, and manually stitch together their response. The business data confirms what SRE teams already feel: PagerDuty's customers spend less each year, and R&D investment is not keeping pace. This article uses PagerDuty's own financial metrics as evidence and explains why AI-native, Slack-first platforms are where high-velocity teams are moving in 2026.

How AI coding assistants changed the incident surface area

AI coding tools compressed the development cycle. Tasks that used to take considerably longer now complete much faster. The result is more pull requests, more deployments, and a fundamentally larger incident surface area.

DORA's research shows that deployment frequency and change failure rate are both measurable outcomes of development velocity. Elite performers deploy multiple times per day, and that acceleration is now within reach for mid-market engineering teams running AI-assisted workflows. It changes the math on everything your on-call stack needs to handle.

Managing AI-driven code change volume

When a team goes from 50 deployments per month to 200, on-call tooling gets stress-tested in a completely different way. Alert routing logic that assumed infrequent changes now produces noise. Runbooks that assumed slower releases become stale within weeks. And the humans responsible for on-call start seeing alert volumes their tools were never designed to route intelligently.

The incident.io vs. PagerDuty comparison highlights exactly this gap: PagerDuty's architecture prioritizes alerting configuration, while modern teams need correlation, context, and coordination speed. Those are different engineering problems, and the platforms solving them look very different.

Frequent deployments spark more outages

More deployments mean more change failure events. DORA's research is explicit: deployment frequency and time to restore service are correlated, not opposed. Faster teams that ship more frequently also tend to resolve incidents faster, but only if their incident tooling can match the tempo.

Legacy tools create a fixed coordination overhead per incident regardless of volume. If your team spends 15 minutes assembling responders, finding the right runbook, and updating the status page every single time, doubling incident frequency doubles your coordination tax. That overhead comes directly out of MTTR and compounds as deployment volume grows.

Alert noise scales with deployment velocity

Higher deployment frequency produces noisier alert environments. More code changes mean more monitoring thresholds crossed, more transient errors that look like incidents, and more genuine incidents that need fast triage. Legacy tools without intelligent alert grouping and deduplication amplify the noise instead of filtering it.

Our alert deduplication groups related alerts automatically, reducing the signal-to-noise ratio before a human is ever paged. PagerDuty includes basic alert deduplication via dedup key matching on standard plans, but advanced alert suppression requires the AIOps add-on at additional cost, layering complexity onto an already cluttered pricing model.

What agent-speed engineering demands from on-call tooling

High-velocity teams need on-call tools that move as fast as their code. Specifically, they need three things: coordination that starts automatically the moment an alert fires, timeline capture that happens without a dedicated note-taker, and deployment correlation that connects a recent PR to an active incident in seconds.

Our Slack-native design delivers exactly this. Web-first tools with bolted-on chat integrations cannot replicate it architecturally because the fundamental data model is different.

Quick incident war room setup

The coordination tax is real and quantifiable. When an alert fires in PagerDuty, a typical workflow looks like this: acknowledge the alert in the PagerDuty web UI, manually create a Slack channel, invite responders by hand, copy the alert link, and paste context into the channel. That process consumes several minutes before any troubleshooting starts.

With a Slack-native tool, the Datadog alert fires and we automatically create #inc-2847-api-latency-spike, page the on-call engineer, pull in service owners based on the affected service, and start recording the timeline, all within seconds of alert ingestion. Engineers type /inc severity high and response begins. No browser tab. No manual setup.

The on-call tool selection framework walks through how to evaluate this specific capability, with questions to ask any vendor during a trial.

Capture rich incident timelines

Manual note-taking during a live incident is a bad use of an engineer's attention. You want every responder focused on the problem, not on maintaining documentation. But post-mortems require accurate timelines, and if nobody captures them during the incident, you spend 90 minutes on Slack scroll-back archaeology three days later when details are fuzzy.

We capture everything automatically. Every status update, role assignment, Slack message, and call transcript populates the timeline as the incident runs. Our AI transcription tool, Scribe, records and summarizes incident calls via Google Meet or Zoom, extracting key decisions and flagging potential root causes without a dedicated note-taker.

"1-click post-mortem reports - this is a killer feature, time saving, that helps a lot to have relevant conversations around incidents (instead of spending time curating a timeline)." - Adrián M. on G2

Connect deployments to incidents faster

Root cause identification takes longer when you manually correlate a recent deployment to an active incident. In a high-velocity environment where multiple engineers merge PRs every hour, the signal-to-noise problem in root cause analysis is acute.

Our AI SRE searches GitHub PRs, Slack messages, historical incidents, logs, and traces to build root cause hypotheses automatically. If the root cause is a code change, the AI SRE identifies the likely PR, surfaces it in Slack, and can generate a fix PR directly without the responder leaving the incident channel. The incident.io AI capabilities overview demonstrates how these capabilities work together end-to-end, from alert to fix generation.

Unpacking PagerDuty's product gap via data

It is one thing to say a tool feels stagnant. It is another to point at audited financial metrics that confirm the innovation slowdown. PagerDuty's own earnings reports tell this story clearly.

Net retention rate fell below 100%

PagerDuty's net retention rate fell to 98% as of January 31, 2026, per their Q4 FY2026 earnings release. The prior year it was 106%. The year before that, it was higher still.

Net retention below 100% means existing customers spend less each year, on average. Some are churning. Others are downgrading. The cohort of customers PagerDuty had 12 months ago now pays less in aggregate than it did before, even accounting for upsells.

For enterprise SaaS vendors at PagerDuty's scale, NRR of 110% or above indicates healthy expansion. Anything below 100% raises a direct question about whether the product delivers enough value to justify its cost year over year.

R&D investment declining as percentage of revenue

PagerDuty generated $492.5 million in revenue in fiscal 2026, per the same earnings release. Their R&D expense line, available through the SEC filings at investor.pagerduty.com, shows where product investment is prioritized.

PagerDuty's fiscal 2026 communications emphasised significant investment in research and development to lead the agentic era. But NRR fell 8 percentage points in a single year. That points to one of two conclusions: either the investment is not landing as product customers value, or customers are unconvinced by the direction.

Why engineering velocity stalls

PagerDuty's architecture is web-first. Coordination happens in a browser-based UI that sends notifications to Slack, not in Slack itself. That distinction creates friction at every step of the incident lifecycle: acknowledging alerts, creating Slack channels, assigning roles, and updating status pages all require context-switching out of the primary tool your team uses.

The PagerDuty vs. incident.io tool comparison covers this architectural gap in detail, including setup timelines and workflow comparisons from teams that have used both platforms.

Decoding PagerDuty's NRR for SRE teams

Financial metrics matter to your CFO. What matters to you as an SRE is what those numbers signal about the product experience your team will have in 12 to 24 months.

PagerDuty value declining for SREs

When NRR drops below 100%, it typically signals that the existing customer base is shrinking. Below 100% means customers churn or contract faster than others expand, which can reflect failed upsell and cross-sell efforts, churn rate increases, or reduced customer lifetime value. For enterprise SaaS at PagerDuty's scale, the business must now acquire new logos to both fund growth and offset the revenue lost from the existing base.

All three show up in PagerDuty's current position. AIOps, the feature set that most directly helps SRE teams dealing with higher alert volumes from increased deployments, is a paid add-on at additional cost. And the peer set of modern incident management platforms is growing rapidly, with transparent pricing and genuinely Slack-native architectures.

"issues are right there in Slack, giving really good visibility into what sort of issues are being submitted and ensuring that people are responding... It's super easy and useful to look and see where things are." - Alex N. on G2

The on-demand webinar on migrating to incident.io covers common reasons teams initiate that evaluation and what they typically find during the trial period.

Why companies reduce their PagerDuty spend

Teams that reduce PagerDuty spend typically do so for a combination of reasons.

  • On-call pricing complexity: Teams often discover add-on costs at renewal that were not transparent upfront.
  • Manual post-mortems: PagerDuty's timeline data and Slack conversation data live in separate systems, requiring manual stitching for every post-mortem.
  • Coordination tax per incident: The manual setup overhead per incident compounds across incident volume, adding up to thousands of dollars annually in pure overhead (see the MTTR math below).
    The 7 ways to reduce incident MTTR covers how Slack-native coordination eliminates most of these friction points by keeping everything in one place.

Why legacy tools slow incident response

PagerDuty's alerting rules engine is sophisticated. If you need deep, complex alert routing logic, PagerDuty's capabilities here exceed most alternatives. But alert routing is not where most teams lose time during incidents in 2026. Coordination is.

The best Slack-native platforms for 2025 breaks down exactly this distinction: tools designed for alerting versus tools designed for the full incident lifecycle, and why that architectural decision matters at 3 AM.

Swiftly addressing SRE pain points

The coordination overhead problem breaks into distinct phases: assembling the right people, locating runbooks and service owners, and then manually updating your status page at resolution. That overhead doesn't involve any actual troubleshooting, but it appears on every single incident regardless of severity.

We eliminate most of it. Alert fires, channel creates, on-call engineer is paged, service catalog context (owners, dependencies, recent deploys, runbooks) surfaces automatically, and timeline recording begins. /inc resolve closes the incident, updates the status page, drafts the post-mortem, and creates follow-up tasks in Jira or Linear. Our built-in migration tooling imports schedules and notification rules directly from PagerDuty, so you are not starting from scratch.

"With incident.io, managing incidents is no longer a chore due the automation that covers the whole incident lifecycle; from when an alert is triggered, to when you finish the post mortem." - Scott K. on G2

Q1 2025: 200+ updates for SREs

We shipped over 200 improvements in Q1 2025, including the AI SRE investigation capability, Scribe call transcription, the @incident AI assistant, native on-call scheduling, and AI-powered post-mortem automation. That cadence reflects a fundamentally different product investment philosophy. Every week, something ships that SRE teams can use immediately. Our public changelog shows exactly what changed and when, with no marketing spin.

Compare that to PagerDuty's H2 2025 product launch, which announced SRE Agent capabilities with general availability projected for October 2025, and their Spring 2026 release that put fully autonomous responder capabilities in "early access" for H2 2026. Some of those announced capabilities are now shipping. Others remain on the roadmap.

PagerDuty's on-call onboarding remains complex

PagerDuty's on-call experience has evolved over recent years with updates like Flexible Shifts and Shift Agent for managing conflicts in chat, but the core workflow for new on-call engineers remains browser-heavy and configuration-heavy. Getting fully operational takes meaningful ramp time.

Our on-call onboarding delivers a fast ramp because the entire workflow runs through slash commands in Slack. New engineers can participate in their first incident without memorizing a lengthy runbook. /inc escalate. /inc assign @sarah. /inc severity high. It feels like using a tool they already know.

"You don't have to leave slack to run the incident. The commands are straightforward and quite intuitive, but you can also use buttons to navigate it." - Fina M. on G2

Is your on-call process ready for 2026?

The question is not whether your current on-call tool works. It is whether it scales to the incident volume and velocity your team will face as AI coding assistants continue to compress development cycles.

Why legacy tools inflate MTTR

MTTR compounds detection time, coordination time, diagnosis time, and resolution time. Legacy tools that create coordination overhead affect every incident, every time, regardless of the technical complexity of the underlying problem.

Here is the math for a 25-person on-call team running 15 incidents per month:

  • 15 minutes of coordination overhead per incident = 225 minutes wasted per month
  • At an estimated $150 loaded hourly cost per engineer = approximately $562.50 per month in pure coordination waste
  • Across a year, that works out to approximately $6,750 in avoidable overhead based on these assumptions alone, before accounting for the extended MTTR impact on customer-facing downtime.

Your actual figure will vary depending on the loaded hourly cost, incident frequency, and coordination time per incident.

AI-ready platforms for modern incident response

PagerDuty's Spring 2026 release positions their SRE Agent as the answer to agent-speed engineering. According to their H2 2025 roadmap documents, the SRE Agent was projected to reach general availability in October 2025, with agent-to-agent Model Context Protocol (MCP) capabilities projected for H1 2026 and fully autonomous responder capabilities in early access in H2 2026. PagerDuty's Spring 2026 announcement did not confirm general availability for triage and diagnosis workflows specifically. That progression shows real investment.

The architectural question is different. When PagerDuty's AI identifies a likely cause, the responder still acknowledges in PagerDuty, correlates context in Slack, and manually bridges the two systems. The AI surface area is strong. The coordination layer that connects AI output to human action remains fragmented across tools.

Our AI SRE is a shipped product. It identifies root causes automatically, connects telemetry, code changes, and past incidents, surfaces the likely PR behind an incident in Slack, and generates fix PRs without a human leaving the incident channel. The Chief AI Officer Show covered how our AI agents can draft code fixes rapidly after an alert.

Bolt-on AI gives you AI features. AI built into a Slack-native platform from the ground up gives you AI-driven response. The practical difference shows up at 3 AM when you want your platform to handle the first 80% of the incident while you focus on the other 20%.

The PagerDuty limitations modern SREs face

Capabilityincident.ioPagerDutyImpact on MTTR
Slack-native architectureBuilt from day oneWeb-first with Slack integrationMeaningful coordination overhead per incident
Automated channel creationAuto-triggered on alertProfessional plan, workflow pre-config neededRemoves manual setup time per incident
AI root cause identificationShipped, delivers up to 80%
reduction in MTTRSRE Agent projected GA October 2025 per H2 2025 roadmap; triage workflow GA unconfirmed in Spring 2026 release; autonomous responder early access H2 2026Faster diagnosis by surfacing the culprit PR
Automated post-mortem draftAuto-drafted from captured incident dataRequires third-party toolingSignificant reduction in post-mortem writing time
Alert deduplicationBuilt into all plansBasic dedup included; advanced suppression requires AIOpsReduces alert fatigue at no added cost
On-call onboardingFast ramp via Slack commandsBrowser-heavy onboarding processNew engineers contributing in days, not weeks
Transparent pricingPublic, on-call add-on shown upfrontPublic pricing available; add-ons unclear at scalePredictable annual planning
Support availabilityShared Slack support channel offered12x5 email support (Professional); P1/P2 SLA tiers (Business and above)Faster issue resolution

PagerDuty's AI readiness for dev teams?

PagerDuty's AIOps for major incident teams and service owning teams represent genuine investment in signal correlation at scale. The underlying technology is capable.

The problem is architectural. Even with AI identifying a likely cause, the responder still needs to acknowledge in PagerDuty, coordinate in Slack, and bridge the two manually. PagerDuty's incident workflow automation can automate parts of this on Professional plan and above, but requires pre-configuration and still does not eliminate the context switch between the alerting system and the collaboration layer.

For teams that need autonomous response today, not projected for H2 2026 early access, the gap between what is shipped and what is roadmapped matters.

How incident.io reduces MTTR in practice

Slack-native coordination can significantly reduce MTTR through measurable components:

  1. Coordination time: drops significantly through automated channel creation, on-call paging, and service context surfacing the moment an alert fires.
  2. Diagnosis time: our AI SRE cuts root cause identification by connecting PRs, logs, and historical incidents automatically.
  3. Documentation time: Scribe transcribes calls, our AI drafts post-mortems significantly complete, eliminating the bulk of post-incident archaeology.

Favor's almost 40% MTTR reduction, a result specific to their environment and incident volume, came from eliminating coordination overhead at a team running real production incidents. Their case study results show the before and after clearly. This customer outcome demonstrates how reducing coordination time translates to measurable MTTR improvements.

Evaluate your on-call tool's fitness

Before your next renewal or vendor evaluation, run your current on-call stack through this checklist. An honest score tells you where the gaps are.

Architecture and workflow:

  • Does the entire incident lifecycle (declaration, escalation, resolution, post-mortem) run inside Slack without a browser?
  • Does your tool automatically create a dedicated Slack channel, page on-call, and start the timeline when an alert fires?
  • Can a new engineer run a full incident using slash commands in their first week?

AI and automation:

  • Does the AI identify the specific code change or service dependency behind an incident, or does it surface correlated logs?
  • Does your tool auto-draft post-mortems from captured data, or does someone still write them from memory?
  • Is AI included in your current plan, or does it require a separate add-on purchase?

Pricing and support:

  • Is on-call scheduling included in your base plan, or is it a separate line item?
  • Can you get a bug fixed during an active incident via a shared Slack channel?
  • Is your vendor's NRR above 100%? Are they investing in the product or harvesting the customer base?

Vendor trajectory:

  • Did your vendor ship meaningful on-call or coordination features in the last six months?
  • Are their announced AI capabilities generally available or projected for H2 2026?

If your current stack fails three or more of these checks, the coordination tax you pay is compounding every incident. The on-call tool selection framework walks through each criterion with evaluation guidance.

Our migration tooling from PagerDuty imports schedules and notification rules directly, and the Rescue Program includes hands-on migration support. Teams typically become operational within three to five days of import.

Schedule a demo to see the AI SRE assistant in action and learn how the coordination difference can work for your team.

Key terms glossary

MTTR (Mean Time To Resolution): The average time from when an incident is detected to when it is fully resolved. MTTR includes detection, coordination, diagnosis, and fix time. Favor's almost 40% MTTR reduction after adopting incident.io, a customer-specific result, meant resolving incidents faster and reclaiming hours of engineering time monthly.

Context switch tax: The coordination overhead created when engineers navigate multiple tools during an active incident. Checking PagerDuty for the alert, Datadog for metrics, Slack for communication, Jira for tickets, and Confluence for runbooks creates several minutes of lost time per incident before any troubleshooting begins.

AI SRE: Our AI system that delivers up to 80% reduction in MTTR. Unlike AI that surfaces correlated logs, the AI SRE identifies the likely code change or service dependency behind an incident and can generate a fix pull request directly from the Slack incident channel.

Dollar-based net retention rate (NRR): A SaaS metric measuring whether existing customers spend more or less compared to 12 months prior. NRR above 100% means customers expand. NRR below 100%, as in PagerDuty's 98% as of January 2026, means customers contract or churn faster than others expand.

Slack-native architecture: A design pattern where the incident management platform's primary interface is Slack itself, not a web application that sends notifications to Slack. Slash commands like /inc escalate and /inc resolve manage the full incident lifecycle without leaving the chat tool your team already uses. That is how we built incident.io from day one.

FAQs

Picture of Tom Wentworth
Tom Wentworth
Chief Marketing Officer
View more

See related articles

View all

So good, you’ll break things on purpose

Ready for modern incident management? Book a call with one of our experts today.

Signup image

We’d love to talk to you about

  • All-in-one incident management
  • Our unmatched speed of deployment
  • Why we’re loved by users and easily adopted
  • How we work for the whole organization