Build, buy or maintain

Exploring incident management solutions

As you explore incident management solutions, you might be wondering whether to build, buy or maintain a tool you’ve already built. Utilize this guide featuring stories from Skyscanner and Bud to determine which is the best approach for your business.

Picture of Chris Evans

Chris Evans

Co-founder & CPO, incident.io

Build, buy or maintain - exploring incident management solutions

As organizations grow in size and complexity, the need for a streamlined incident management process becomes increasingly evident. It's no longer just a nice-to-have; it's a necessity for ensuring smooth operations and minimizing disruptions.

But when it comes to implementing an incident management solution, organizations are often at a crossroads: should they build their own tool in-house or invest in an off-the-shelf product? The decision isn't always as straightforward as it might seem. Many organizations, especially those that have grown rapidly, find themselves with a patchwork of solutions that have been cobbled together over time.

These might have been sufficient when the company was smaller, but as the scale of operations increases, so does the complexity of managing incidents effectively.

It's not just about choosing to build or buy; it's also about evaluating and unpicking the existing set-up and determining whether it still meets the organization’s evolving needs. At a certain point, it makes sense to pause and assess the situation. Should you continue to maintain your homegrown solution? Would it make more sense to start afresh and build something more suited to the current needs? Or should you invest in buying a specialized solution?


Let’s define an incident management platform

Before we dig into building and buying, let’s first define what we’re talking about when we say Incident Management Platform.

For many, the term is linked with paging solutions that allow you to configure and notify your on-call teams, for others it’s the custom Slackbot they’ve built to create a channel for each new incident, and for some it’s the solution they’ve built out of Jira, string and sellotape. Whilst all of these are important for dealing with specific aspects of incidents, none cover the full spectrum of activities that surround them. An incident management platform is a comprehensive solution designed to handle the full life cycle of an incident—from detection and classification to resolution and post-incident analysis.

image

A complete platform offers features like automated workflows, integrations with other systems, centralized dashboards for real-time insights, and even assisting with post-incident reviews to help you learn and improve.

The needs for such a platform arise from the myriad pain points businesses experience when dealing with incidents, ranging from inefficient communication and lack of real-time information, to the sheer manual effort required to coordinate a response across various departments. And let's not forget the downtime costs, which can be a major drain on resources and reputation.

In essence, an incident management platform serves as a centralized hub that streamlines the entire incident process, contributing to faster detection and response, more efficient learning and knowledge distribution, and better mechanisms to ensure your process is followed correctly.


There are good reasons why people have built their own tooling

"Should I build or buy my spreadsheet software?", said nobody ever. Between Google Sheets, Excel, and the many other options, this is a solved problem. You'd be hard-pressed to find a compelling case for building such a tool in-house.

But build or buy remains a hot topic in the domain of incident management, and for good reason: off the shelf incident management solutions have failed to keep pace with the needs of modern organizations. Shifts from office based work to hybrid and remote, changes in tooling and communication, and the pace of change have all contributed to unmet expectations. It’s no longer a case of opening a conference bridge and taking notes in a doc, it’s communication through tools like Slack, live integrations into monitoring and alerting tools, links to issue tracking tools and status pages, and automated workflows carrying some of the process burden.

Making your phone beep when an incident occurs is just the first step.

Many users feel under-served by what happens—or doesn't happen—after that initial alert.

The demands for features and integrations that cover the entire incident lifecycle have grown, leaving a gap that some organizations think they can fill better themselves.


Building and maintaining your own tooling

Building your own incident management platform certainly has its advantages, the most notable being full control over its features and functionality. This means you can fully tailor the system to your organization's specific needs, providing a seamless integration with your existing tools—whether they're developed in-house or bought off the shelf.

However, this level of customization doesn't come for free. It involves not just an initial time and resource investment for development, but also a long-term commitment for maintenance and updates. You need to consider this as an ongoing obligation rather than a one-off task: an ownership tax of sorts.

Assuming a best-case scenario where your tool gains widespread adoption and becomes indispensable, you're in for the long haul of supporting it like any other product. This not only involves regular security patches and addressing technical debt but also meeting the growing demands for additional features from an expanding user base. Ignoring these feature requests risks driving users to seek alternative solutions, potentially leading them to consider buying an external platform.

One final point to consider when thinking about building in-house is the risk of falling behind the industry curve. Off-the-shelf solutions are continually updated with new features and follow the latest best practices. When you go it alone, you’re responsible for keeping your system current and competitive, which is easier said than done. Failing to stay up-to-date could leave you lagging behind, not just in terms of features but also in adopting industry standards, placing you at a competitive disadvantage. This is especially challenging when your primary focus is your core business.

A note on open source options

Building your system doesn't necessarily mean starting from scratch. There are open-source options available, like Monzo Response and Netflix Dispatch, that offer a solid foundation.

These projects can be both a blessing and a curse: while they are certainly good starting points, they can be hit or miss when it comes to support and community engagement. For organizations looking to prove the concept without fully committing to the build or buy options, these open-source platforms can be a reasonable middle ground. They offer a viable alternative if you can't afford a full-fledged solution but also don't have the time to build something entirely from scratch.


Buying a specialized incident management platform

Choosing an off-the-shelf incident management platform like incident.io offers some compelling advantages that are hard to ignore. One of the most significant is speed—getting a robust system up and running quickly, without the development headaches. Plus, with ready-made integrations for all your existing tools, you’re able to seamlessly integrate with your existing processes and quickly deliver value across multiple teams.

An often overlooked benefit, buying also brings you into a larger community and ecosystem.

You're not just purchasing software; you're gaining the experience of the team that builds it and a community of peers who add value through shared insights.

This collective wisdom can be a game-changer for your incident management efforts. Before diving in, it's worth giving a nod to a couple of considerations. While there's a cost aspect to purchasing a tool, it's something that’s easily managed, and that scales with your team and needs. With customization, integrations, configuration and extension through APIs, it’s rare that you can’t suit your needs. And finally, there’s vendor lock-in. Whilst it should factor into your decision, any provider worth dealing with will provide full access to your data, rendering this somewhat irrelevant.

While these concerns are valid, they often pale in comparison to the immediate and long-term benefits of buying a ready-to-use solution. It’s important to give them thought, but they shouldn’t deter you from making what is often the smarter choice.


Key considerations before your decide

When weighing the build vs buy question for your incident management solution, there are several crucial factors to consider:

Scope and functionality

Before making a decision, the single most important thing to understand is the scope and requirements for incident management. It might seem simple to build a tool that automates the basics, like bootstrapping a communication channel during an incident. But this only scratches the surface of what a comprehensive incident management platform should offer. Most people look for some degree of process automation, more integrated customer communications, the ability to track and report on follow-up work, and improved workflows for creating and managing post-mortems to name a few.

It’s also important to think about who will be using the tool and what expectations they have.

  • How will your data teams pull and integrate data?
  • Do your senior leaders have expectations around how they’re notified and pulled in?
  • Do people in your legal and/or risk functions have regulatory requirements must consider?

Building might seem appealing at the outset, but it’s important to consider the current and future requirements before opening your coding environment!

Hosting and infrastructure

It might sound obvious, but you’re going to want your incident management platform to work when everything else in your organization stops working. For this reason it’s important it’s not hosted on the same platform—or even in the same region—as everything else.

In practice, that often means introducing bespoke, one-off, build and deploy processes to ship your solution to a separate cloud, region or platform.

Building a new path to production in a way that meets the security requirements for your organization can be incredibly high. And once it’s in place, there’s likely to be a knowledge gap with the rest of the organization, so you can expect questions like “how do we deploy that incident tool?” and “I need to update a software dependency, but I don’t have permission on that platform. How can I fix this?”

Security

Security is a top priority when it comes to incident management, given the sensitive nature of the data involved. If you're buying an off-the-shelf solution, you can often rely on the vendor's reputation and third-party security audits to ensure your data is safe. These vendors support large customer bases, so they've invested heavily in security measures—it's in their best interest to do so.

On the other hand, if you opt to build in-house, you'll need to commit resources to achieve similar security standards. Hosting considerations are crucial here. For instance, if you want to isolate your incident management system from your main platform to ensure uptime during incidents, you'll likely have to implement additional security measures for this separate environment. In either case, there's no room to skimp on security features like authentication, authorization, and restricted data access, especially when dealing with sensitive incidents that have strict access requirements.

Cost

The cost of a dedicated platform might seem high in comparison to using in-house engineers whose cost is already accounted for. But whilst purchasing costs might seem high, it’s important to weigh things up fairly.

The true cost of building software is always higher than the time it takes to write the code:

  • Firstly it’s important to carefully scope what you’re building. What might start as a two-week project can quickly spiral into months if you’re not careful.
  • Next, consider the salaries for the team that will design and implement the first version.

View the BvB Calculator

Cost - Continued

  • Then, consider the effort required to maintain and support the system. This might feel negligible, but the costs can be significant over time, and are best thought of as a background tax on all future work.
  • Finally, you’ll want to consider the opportunity cost of diverting resources from other projects. The chances are your company doesn’t sell incident management platforms, so any effort spent on doing so is detracting from effort that could be contributing to your business.

All things considered, that quote to buy a platform probably seems a little more reasonable now!

“We've spent less on incident.io than we have on just maintaining our old tool. And that's a dollar-for-dollar easy comparison to make.”_

Time to value

Unless you have an exceptionally painful procurement process, buying software is always going to offer faster time to value, and this should feature as a key consideration in your decision making process: can the organization your buying from operate at the speed you need them to?

In most cases, building is going to take more time than buying, so it’s important to consider how urgently you need a solution, what the initial scope looks like, and think about how quickly you can iterate and update the system to meet changing needs once it’s in the hands of users.

Flexibility

When it comes to build vs buy, flexibility is one axis where building will always have the edge: when you own the code, you can literally do anything. But before you chalk up a win in the build column, it’s worth looking closely at what you want to achieve, and whether an incident management platform offers enough flexibility to support your use-cases, and at a lower cost.

Key areas to consider:

  • How flexible is the platform at allowing you customize names and descriptions to your organization? It’s going to be frustrating to choose a platform that calls things “post-mortems” when you call them “incident debriefs”. And does it support customization of things like your incident lifecycle stages, the timestamps you want to collect, and incident severities?
  • Can users of the platform configure their own no-code automations?
  • Does the system integrate with the other tools that need to work together during incidents? Things like issue trackers, document editors, and service catalogs. And when there’s no integrations, can you extend the functionality using APIs and Webhooks?

Ultimately, if you're buying, check how flexible the platform is when it comes to supporting your actual needs. If you're building, this flexibility is in your hands, but it demands time and expertise to execute well.

Support

When you opt for a purchased solution like incident.io, one of the unsung benefits is having a dedicated support team at your fingertips. Think of it as offloading a significant chunk of work and worry. Instead of diverting your attention to troubleshoot issues or implement new features, you can simply reach out to the vendor's support team for quick and expert help.

During your evaluation phase, it's crucial to assess the responsiveness and expertise of support. Make sure they not only meet your expectations but can also act as a true partner in maintaining and improving your incident management process.

External support isn't just a nice-to-have; it's a significant time-saver that allows your internal teams to focus on your core business activities rather than being slowed down with maintenance and troubleshooting.

By relying on a robust support organization, you're essentially gaining an extended team that’s specialized in ensuring your incident management runs smoothly, saving you considerable time and effort in the long run.

Adoption

If you care about adoption, and you really should, your incident management tool needs to be more than just functional—it needs to be intuitive (and even delightful!) to use.

This is especially crucial during the heat of an incident, where stress levels are high and time is of the essence. People don't have the luxury of navigating through a cumbersome interface or figuring out how to do things; they need something that's straightforward, ergonomic and user-friendly.

But the challenge doesn't stop there. Internal tools are often "engineered for engineers," which can hinder wider adoption across the organization. Incident management often involves multiple departments, including those staffed with non-technical folks. If the tool isn’t designed with them in mind, you risk creating an internal solution that serves only a portion of the people who need it.

By opting for a well-designed, off-the-shelf solution, you're not just buying a tool; you're buying years of UX research and design aimed at creating an intuitive experience for a broad range of users. This is an often underestimated advantage that can be particularly beneficial during high-stress, time-sensitive situations like incidents.


Build or buy, it’s up to you. But probably just buy.

Like most things, the build, buy or maintain decision for incident management is a complex one, full of trade-offs and considerations. The appeal of building in house is clear, with enhanced control and customization, and the fun of throwing engineering effort at an interesting problem!

But all things considered, the time, resources, and expertise required typically outweigh these benefits, and what you’re left with is a poor imitation of a platform you could have purchased in a fraction of time and cost.

At incident.io, we have decades of collective experience in incidents and incident management, we talk to thousands of companies about the subject, and we spend tens of thousands of hours designing, building and iterating on a platform that allows organizations to truly level-up how they deal with things going wrong. You’d be hard-pressed to build a tool that rivals that!

So whilst there may be specific instances where building is the right choice, for the majority, evaluating and buying a solution is the sensible decision, so you can focus on what your organization does best.

  • “We started building a Slack bot which was easy to get up and running, but the further we went down that path, the more we realised we were reinventing the wheel.”

  • “We didn't want to put the manpower behind having to build something, and the idea of having to maintain it indefinitely was just not something that we were interested in doing.”

  • “If you are going to lose that flexibility, you want to find a partner who is going to listen to your feedback. At incident.io, you're almost obsessed with making the product better.”

  • “We could not have developed a package of software as complete and rich as incident.io without a significant squad of people to build it, run it, and deal with all the little bugs.”

  • “We partner with a ton of different third party vendors, and this has definitely been the easiest one and the best relationship I've seen in all of my time.”

  • “Building a Slackbot is not that challenging, but doing it at scale across a lot of incidents and doing it well is hard.”

  • “The speed of iteration is fantastic. Seeing that was the point where I got absolutely huge confidence that incident.io were the right choice.”

  • “If you look at the aggregate time spent on incidents, a tool like incident.io only has to make us a fraction more efficient to pay for itself.”

Download guide

Want a .pdf version of the Build, buy or maintain?

Download guide

Operational excellence starts here