Is Grafana OnCall free?

The Grafana OnCall Cloud free tier supports up to 3 active users per month at no cost. The open-source self-hosted version is free to download and run, but the OSS project entered maintenance mode on March 11, 2025 and Grafana will archive it on March 24, 2026. Infrastructure and Twilio costs still apply for self-hosted deployments.

Can I self-host PagerDuty alternatives?

Yes. GoAlert and the archived Grafana OnCall OSS are fully self-hostable. Both require PostgreSQL, and both require a separate Twilio account for SMS and voice notifications. Neither includes incident coordination features like Slack-native workflows, AI post-mortems, or service catalogs.

How much does Twilio cost for a self-hosted paging system?

A moderate-volume team sending 5,000 outbound SMS messages monthly, handling voice escalations, and managing five phone numbers can expect approximately $315/month in Twilio costs based on standard US SMS rates of $0.0075-$0.04 per message and $0.013-$0.030 per minute for voice.

Is Opsgenie shutting down?

Yes. Atlassian stopped selling new Opsgenie instances on June 4, 2025 and will completely shut down Opsgenie on April 5, 2027. All on-call and alerting features are being migrated to Jira Service Management and Compass.

What's the real TCO difference between self-hosted and SaaS incident management?

Self-hosting adds infrastructure costs ($7,200-$16,800/year), SMS/voice costs ($3,600-$6,000/year), and engineer maintenance time (~$7,920-$17,160/year at 10% FTE). Total self-hosted TCO runs $18,720-$39,960 annually for a 50-person team, comparable to commercial SaaS pricing but without the reliability guarantees or incident coordination features.

How quickly can a team migrate from PagerDuty to incident.io?

Most teams complete the migration in 14-30 days using incident.io's schedule and escalation policy import tooling, with parallel running periods to validate the new setup before cutting over fully.

Best Open-source PagerDuty alternatives: Grafana OnCall, Opsgenie, and self-hosted options | Blog

Updated March 06, 2026

TL;DR: The most feature-complete open-source PagerDuty alternative, Grafana OnCall OSS, entered maintenance mode on March 11, 2025 and Grafana will archive it on March 24, 2026, eliminating the strongest self-hosted option. Tools like GoAlert and OneUptime remain available at zero licensing cost, but self-hosting a production paging system adds real infrastructure overhead: PostgreSQL clusters, Redis, RabbitMQ, and Twilio for SMS/voice. When you factor in infrastructure plus engineer maintenance time, the "free" option can cost $18,720-$39,960 annually. incident.io delivers developer-friendly flexibility without the "who watches the pager?" risk: a managed Slack-native platform with no infrastructure to own, AI-powered post-mortems, and up to 80% MTTR reduction.

For teams tired of PagerDuty's pricing and stagnant feature set, open-source alternatives like Grafana OnCall promise freedom and cost savings. But self-hosting your incident response stack creates a new risk: who pages you when the pager breaks? A 50-person engineering team running PagerDuty Business pays approximately $24,600 annually before adding noise reduction, AI, and status pages as paid add-ons. That math naturally pushes engineers toward open-source options.

This guide evaluates the top open-source and self-hosted alternatives against modern SaaS platforms so you can decide where to invest your engineering hours, and avoid replacing one expensive problem with a more expensive one.

Why engineering teams are seeking PagerDuty alternatives

PagerDuty has accumulated features through acquisition rather than integration, creating a web of add-ons that inflate the real cost well beyond the advertised base price. Users on G2 consistently cite UI complexity and dated design as top frustrations, describing setup as confusing and the interface as unchanged from years past.

Five reasons consistently drive teams away from PagerDuty:

Pricing complexity: The Business tier runs $41 per user per month, but AIOps, advanced runbooks, and status pages cost extra. A 50-person team pays $24,600 annually before touching enterprise features.
UI debt: The web interface reflects 15 years of bolt-on development, built for an era before Slack.
Rigid workflows: Everything lives in PagerDuty's own UI, forcing constant context-switching to get back into Slack where engineers work.
Support tiers: Live chat is restricted to Premium Support customers, leaving standard tiers relying on email channels for most issues.
Tool sprawl tax: A typical incident involves PagerDuty for alerts, Slack for coordination, Jira for follow-up, and Confluence for post-mortems. That coordination overhead costs roughly 15 minutes per incident before any troubleshooting starts.

Read this comparison of PagerDuty, incident.io, and FireHydrant for a practical breakdown of how these tools differ in day-to-day use.

What "self-hosted" really means for incident management

"Self-hosted" is not downloading a binary and running it on a single VM. A production-grade paging system needs high availability across multiple availability zones.

The infrastructure reality:

Database cluster with HA replication, message queue (RabbitMQ), and caching layer (Redis) across multiple availability zones
HTTPS termination: Public endpoint with proper TLS, required for webhook ingestion and Slack callbacks
SMS/voice gateway: Neither GoAlert nor Grafana OnCall OSS includes SMS delivery. You configure and pay for your own Twilio account

The Twilio dependency is real and ongoing. Standard US SMS costs $0.0075 to $0.04 per message sent, plus $0.013 to $0.030 per minute for outbound voice calls. A team managing moderate alert volume should budget $300-$500 per month for Twilio alone.

The risk nobody talks about:

If your Kubernetes cluster runs out of disk space at 3 AM, your incident paging system goes down exactly when you need it most. There's no SLA on a system you own. The team that built it has likely rotated, and the one engineer who understands the Twilio integration is on vacation. This is the "who watches the watchers?" problem, and it's not theoretical.

Here's the annual infrastructure baseline before adding a single engineer-hour of maintenance:

Component	Monthly estimate	Annual estimate
Compute (multi-AZ, 3 nodes)	$300-$600	$3,600-$7,200
PostgreSQL HA	$200-$500	$2,400-$6,000
Redis + RabbitMQ	$100-$300	$1,200-$3,600
Twilio SMS/voice	$300-$500	$3,600-$6,000
Total infrastructure	$900-$1,900	$10,800-$22,800

Top open-source PagerDuty alternatives

Grafana OnCall

Grafana OnCall was the most feature-complete open-source on-call management tool available. Launched on Grafana Cloud in late 2021 and open sourced in 2022, it provides on-call schedules, escalation chains, alert grouping to reduce noise, and deep integration with the Grafana/Prometheus observability stack.

What it does well:

Native integration with Grafana, Prometheus, Alertmanager, and Zabbix
Flexible escalation policies with calendar-based schedule management
Alert templates, routing rules, and outgoing webhooks
Free cloud tier supporting up to three active users per month

The critical limitation you need to know:

As of March 11, 2025, Grafana OnCall OSS entered maintenance mode and Grafana will archive it on March 24, 2026. No further updates or new features will arrive. More critically, the OSS version relied on Grafana Cloud as a push notification relay for SMS, phone, and push notifications, and Grafana will also deprecate that connection on March 24, 2026.

If you build a production paging system on Grafana OnCall OSS today, you're building on a foundation that stops functioning in months. You'll need to migrate to either Grafana Cloud IRM (paid, starting at $20/active user/month on the pro tier) or a different tool entirely.

The Slack distinction:

Grafana OnCall sends notifications to Slack but manages incidents through its own web UI. Your engineers receive an alert in Slack, then switch to a Grafana dashboard to manage the response. It's Slack-as-notification-endpoint, not Slack-native workflow.

GoAlert

GoAlert is a minimalist Go-based tool built by Target's engineering team for on-call scheduling and automated escalations. Target ships it as a single binary with a PostgreSQL backend, making it one of the simplest self-hosted options to deploy.

What GoAlert does well:

Lightweight architecture: single binary, standard PostgreSQL database
Clear escalation policy model: services, escalation policies, schedules, and rotations
Two-way SMS for US-based numbers
Basic API for alert ingestion via integration keys

Where GoAlert stops short:

GoAlert focuses on alerting and escalation, handling the "page the right person when an alert fires" problem. It doesn't include incident coordination features that most SRE teams need:

No Slack channel creation or /inc commands
No timeline capture during active incidents
No AI-powered root cause analysis
No post-mortems
No service catalog or integrated runbooks
No status pages

GoAlert also requires you to configure a Twilio account for SMS and voice delivery. For teams outside the US, government regulations in some countries restrict two-way SMS, limiting notification options.

OneUptime

OneUptime combines monitoring, status pages, incident management, on-call scheduling, logs, metrics, and traces in a single Apache 2.0-licensed package. This breadth creates both OneUptime's appeal and its limitation: you're accepting "good enough at everything" rather than "excellent at one thing." For teams with genuine budget constraints and the engineering bandwidth to maintain a complex self-hosted stack, it's worth evaluating. The GitHub repository shows active development, but 6,000+ stars is modest compared to more established specialized tools, and concentrating this many critical operational functions in a younger project carries risk.

LinkedIn Iris and Oncall

LinkedIn's Iris and its companion Oncall scheduling tool are battle-tested at true scale, processing over 700,000 messages daily with bursts exceeding 3,000 messages per second. The setup complexity matches that scale. Iris requires Python dependencies, LDAP integration for user management, and multiple configuration layers connecting Iris to the Oncall component. LinkedIn has continued developing the Iris message processor as a separate component, adding another dependency to manage. For a 50-200 person SaaS company, the operational overhead of running these projects likely exceeds the cost of a commercial tool.

Commercial alternatives often compared to open source

Opsgenie

Opsgenie is not open source. It's a SaaS product that frequently appears in open-source comparison searches because it historically offered lower pricing than PagerDuty. That distinction matters less now for one critical reason: Opsgenie is shutting down.

Atlassian announced the following timeline:

June 4, 2025: End of sales for new Opsgenie instances
April 5, 2027: Complete shutdown of Opsgenie, after which the service will no longer be accessible

Atlassian is moving Opsgenie's capabilities into Jira Service Management and Compass, consolidating its IT operations offering. If you're currently on Opsgenie, your migration timeline is set by Atlassian, not you. The migration routes your alerting and on-call configuration into Jira Service Management, an ITSM platform with a significantly different workflow focus than a developer-centric on-call tool.

The hidden costs of open-source incident management

"Free software" means zero licensing cost. It does not mean zero cost. The cost shifts from your vendor invoice to your engineering hours, and those hours are expensive.

Maintenance overhead: A conservative estimate puts self-hosted maintenance at 10% of one engineer's fully loaded time annually. Based on industry benchmarks for fully-loaded SRE compensation ranging from $79,200 to $171,600 per year (reflecting variation in seniority and geography, per sources like Glassdoor and levels.fyi), that's roughly $7,920–$17,160 per year in ongoing maintenance alone, before the initial 40-80 hour setup ($4,400-$8,800 one-time).

Cost category	Annual estimate
Infrastructure (compute, DB, cache)	$7,200-$16,800
SMS/voice (Twilio)	$3,600-$6,000
Engineer maintenance (10% FTE)	$7,920-$17,160
Total self-hosted TCO	$18,720-$39,960

Feature gaps that compound over time: Open-source alerting tools handle the "page the right person" problem. They don't handle the incident coordination that follows: no automated Slack channel creation, no AI-powered root cause analysis, no timeline capture during active incidents, no AI-drafted post-mortems, no no-code workflow builder, and no integrated status pages. For a team handling 15 incidents monthly, manual post-mortem reconstruction alone costs $29,700 per year in engineering time at a 90-minute average per incident and $110 per engineer-hour.

Security and compliance you own completely: When you self-host, your CISO's SOC 2 questionnaire gets harder. You own data encryption, access controls, audit logs, penetration testing, and data residency compliance. You're not inheriting a vendor's SOC 2 Type II certification. You're building your own, and every PostgreSQL security patch is your responsibility at 3 AM.

incident.io: The Slack-native alternative to PagerDuty

We didn't build incident.io as another SaaS alerting tool with a Slack bot. We built it as a Slack-native incident management platform where the entire incident lifecycle, from declaration through post-mortem, runs inside Slack without forcing engineers into a separate web UI.

The distinction from tools like Grafana OnCall's ChatOps approach is direct: Grafana sends notifications to Slack. We run in Slack. When a Datadog alert fires, incident.io auto-creates #inc-2847-api-latency, pages the on-call engineer, pulls in service owners from the Catalog, starts a live timeline, and begins capturing context. Your team assembles in under 3 minutes instead of 12.

What we built that open-source tools can't replicate:

AI SRE with autonomous investigation: Our AI SRE connects telemetry, code changes, and past incidents to surface root causes. Intercom Engineering documented a case where the AI generated the exact fix their team would have implemented, but in 30 seconds instead of 30 minutes. No GoAlert fork does that.

AI-powered post-mortems: Our Scribe feature joins your Zoom or Google Meet calls as a participant, transcribes in real-time, and captures decisions in a structured timeline. When the incident closes, the post-mortem is already 80% complete. You spend 10-15 minutes reviewing instead of 90 minutes reconstructing from Slack scrollback.

No-code workflow builder: Customize escalation logic, stakeholder updates, and evidence collection through a visual workflow editor without writing scripts. When Slack changes their API, we absorb the maintenance. You don't.

Zero infrastructure: Install incident.io in 30 seconds and run your first incident through Slack before the end of the day. No PostgreSQL cluster to provision. No Twilio account to configure. No 10% FTE maintenance budget.

Our Pro plan at $45/user/month with on-call delivers the full platform without the build vs. buy debate.

G2 reviewers who've made the switch describe the experience directly:

"We like how we can manage our incidents in one place... The recent addition of on-call allowed us to migrate our incident response from PagerDuty and it was very straight forward to setup." - Harvey J. on G2

"1-click post-mortem reports - this is a killer feature, time saving, that helps a lot to have relevant conversations around incidents (instead of spending time curating a timeline)" - Adrian M. on G2

"Frictionless configuration and onboarding (so easy that our first incident was created/led by a colleague even before the 'official rollout' all by themselves!)" - Luis S. on G2

For teams migrating from PagerDuty, the PagerDuty migration guide covers schedule and escalation policy import to minimize transition friction.

Comparison summary: open source vs. PagerDuty vs. incident.io

Feature	PagerDuty	Grafana OnCall OSS	GoAlert	incident.io
Hosting model	SaaS	Self-hosted (archived Mar 2026)	Self-hosted	SaaS
License cost (50 users)	~$24,600/yr+	$0 (+ infrastructure)	$0 (+ infrastructure)	Pro: $45/user/month with on-call
Estimated TCO (50 users)	$24,600+ before add-ons	$18,720-$39,960	$17,520-$36,360 (PostgreSQL + Twilio only; no Redis/RabbitMQ required)	~$27,000/yr (Pro with on-call); lower net cost when MTTR savings factored in
Maintenance required	None	10% FTE + project archived	10% FTE	None
Slack integration	Notifications only	Notifications only	Minimal	Fully Slack-native
On-call scheduling	Yes	Yes	Yes	Yes
Incident coordination	Limited	No	No	Full
AI features	Add-on cost	Basic anomaly detection	None	AI SRE + auto post-mortems
Post-mortems	Manual/templated	Manual	None	AI-drafted (10-15 min edit)
Status pages	Add-on cost	No	No	Integrated
Service Catalog	No	No	No	Yes
Infrastructure required	None	PostgreSQL, RabbitMQ, Redis, Twilio	PostgreSQL, Twilio	None
Support	Premium tier for live chat	Community/paid cloud	Community	Shared Slack channel

Choosing the right tool for your stage

When open source makes sense:

You're early stage with zero budget, and you have an engineer with infrastructure bandwidth who genuinely enjoys maintaining distributed systems. GoAlert is a legitimate starting point for "we just need to page the right person." Grafana OnCall Cloud's free tier (3 users) works if you're already deep in the Grafana stack, but plan your migration off the OSS version before March 2026.

The honest check: if the engineer who'd maintain your self-hosted paging system is also the engineer you'd page for a P1, you have a conflict of interest built into your incident response architecture.

When incident.io makes sense:

In our experience, incident.io makes sense when your team is past early stage, you're handling 10+ incidents per month, and the 15 minutes of coordination overhead per incident starts showing up in MTTR metrics your VP Engineering asks about. You want post-mortems published within 24 hours instead of 3 days. You want new on-call engineers productive in 3 days instead of 3 weeks. And you do not want to own the infrastructure your paging system depends on.

The tradeoff is real and worth naming: incident.io is opinionated. Strong defaults accelerate setup and get you operational in days, but if you need infinite alerting flexibility and deeply custom workflows editable in YAML, PagerDuty offers more configuration surface. Most teams at 50-500 engineers don't need infinite flexibility. They need incidents to suck less, faster.

If you're migrating from PagerDuty or Opsgenie and want to see what the process looks like end-to-end, schedule a demo and we'll walk through your specific stack.

Key terms glossary

MTTR (Mean Time To Resolution): The average time from incident detection to resolution. Reducing MTTR is the primary metric for incident management effectiveness.

Escalation policy: A defined sequence of notification steps triggered when an alert isn't acknowledged. Typically routes from primary on-call to secondary, then to a manager.

High availability (HA): A system design ensuring no single point of failure. For a self-hosted paging system, HA requires database replication, multiple compute nodes across availability zones, and redundant message queuing.

On-call rotation: A schedule assigning which engineer receives pages during a given time window. Tools like GoAlert, Grafana OnCall, and incident.io all manage rotation scheduling.

Post-mortem: A structured document analyzing what happened, why, and what changes prevent recurrence. Manual reconstruction typically takes 90 minutes. AI-assisted drafting in incident.io reduces that to 10-15 minutes of editing.

Coordination tax: The time lost to logistics before actual troubleshooting begins: creating Slack channels, paging the right people, finding context across tools. This typically runs 10-15 minutes per incident for teams using fragmented tooling.

Slack-native: An architecture where the entire incident workflow runs inside Slack through slash commands and automated channel management, rather than using Slack as a notification endpoint for a separate web UI.

ChatOps: An approach where operational actions and notifications are delivered through a chat platform. Distinct from Slack-native: ChatOps tools manage incidents in their own UI and push updates to Slack, while Slack-native tools treat Slack as the primary interface.