How Torq replaced PagerDuty & Rootly with one standardized incident workflow

< 4 weeks to migrate off PagerDuty & Rootly
17 teams onboarded
100% of post incident follow ups documented & actionable

Torq builds security automation software for enterprise SOC teams, helping analysts investigate and respond to threats faster. Their customers include PepsiCo, Siemens, Marriott, Uber, and Virgin Atlantic.

When Nadav Baumer joined as DevOps Team Lead roughly eighteen months ago, he inherited PagerDuty for alerting, an abandoned Rootly implementation, and no standardized way of managing incidents across a fast-growing engineering org. Two months into making the switch to incident.io, PagerDuty was gone. Teams that were never on the rollout plan were asking for access and developers who had actively avoided the previous setup were lining up to migrate.

When your incident tooling creates more work than it resolves

Torq's engineering team was growing fast, but their incident process hadn't kept up. Alerts fired into PagerDuty with no way to distinguish a noisy page from a real incident. Keeping stakeholders informed meant manually tracking down the right people and sending messages on Slack or WhatsApp. Postmortems had to be pieced together manually after the fact, with no structured incident record to work from. "Leadership once asked me how many incidents we'd had that quarter and the answer was, I have no idea. I could look at postmortems and manually count Slack channels, but I knew some were missing," recalls Nadav, DevOps Team Lead at Torq.

Nadav's manager, R&D Director Moshe Beladev echoed the frustration from a different angle:

Before, stakeholders had to interrupt the engineers actively fixing the incident just to find out what was going on. Now, with automated update prompts and AI-drafted status messages, anyone joining can get up to speed without pulling focus from the people doing the work.

Rootly had been brought in to fill some of the gaps, but it never earned developer trust. The Slack integration fell short of what the team needed, and what looked flexible at first created more overhead over time, especially without a dedicated SRE owning the system. As the setup became more tailored, it was harder to maintain and support with confidence.

Rootly never really earned developer trust. Without a dedicated SRE owning the configuration, it couldn't keep up with us. The more we customized it, the more fragile it got and when things broke, teams just went back to piecing things together in PagerDuty.

Meanwhile, the rest of the process required a lot of manual effort. When an incident was serious enough, managers would manually send WhatsApp messages to notify stakeholders. Postmortems were inconsistent, sometimes done in a shared doc, sometimes skipped entirely, depending on the team and severity. No standard template was reliably followed.

For a company growing at roughly 300% revenue in 2025, with new developers joining regularly, this fragmentation was becoming a real problem. Every new hire meant another person navigating a patchwork of disconnected tools with little standardized process to follow.

Flexibility that actually scales

Before Nadav joined, the team had already explored alternatives, with early proof-of-concept work across a couple of vendors, including incident.io . When Nadav came in, it was clear PagerDuty alone wasn't going to cut it. He drove a formal evaluation, comparing options head-to-head. The team already had high confidence in incident.io's technical capability from the earlier POC, and the formal evaluation confirmed that Torq could keep the flexibility it needed without taking on a configuration burden that would grow over time.

One thing stood out above everything else.

The thing developers liked most was the Slack-first approach: handle incidents where you already work, without ever having to switch context. For a team that lives in Slack, that was critical.

How it works today

All alerts now flow through incident.io with intelligent routing. Non-critical alerts go to team-specific Slack channels, configured dynamically via catalog attributes. Critical alerts go to a central channel and trigger escalation paths from there. Customer success has a separate flow that pipes directly into incident.io, triggering incidents and escalations automatically.

Every incident runs through Slack. A dedicated default incidents channel gives the entire org visibility into what's happening, while per-incident channels keep the actual response focused. Bot nudges, automated prompts that ask "do you need help?" or suggest escalating after 30 minutes, have been especially valuable for newer engineers who might otherwise sit on an incident too long alone.

The automated prompts are especially valuable for newer team members who may lack confidence in handling incidents solo.

Postmortems are now part of the incident flow rather than a separate process. Incident managers decide when one is warranted, and when they run one, action items stay connected to the incident record. For the first time, Torq has real visibility into incident volume, patterns, and response metrics. The question that once had no answer now has a dashboard.

Adoption came faster than expected

Torq planned a slow, gradual rollout, team by team, carefully managed. But once they got rolling, teams were actively asking to make the switch, pushing the rollout faster than anyone had planned. The enthusiasm caught everyone off guard.

We wanted to do it slowly, very slowly, but teams were lining up to be the next team to move. We intended to go gradually, and they actually pushed us to do it faster.

Then something even more telling happened. The field engineering team, who weren't part of the rollout plan at all, encountered the new incident flow organically during a cross-team incident. They immediately asked for access. So did customer success. Teams nobody had pitched were requesting to be onboarded.

For Nadav, that was the clearest signal: "When teams you didn't even plan to onboard are asking for access, you know something's working."

The product itself has played a role in sustaining that trust. A recent update to Slack message formatting - what Nadav called " a game-changer" - reinforced confidence the product would keep improving. “The system is getting better daily. We've made requests via our shared Slack channel and many have already been shipped.”

Torq went from disconnected tools to one platform that handles the full lifecycle. For a fast-growing company onboarding new developers regularly, that standardization matters more than any individual feature - a new hire on their first on-call shift knows exactly how incidents work, how postmortems are documented, and where alerts go, without having to navigate anything manually.

Our developers are happy, my managers are happy, overall everyone's happy. That’s really a best case scenario.