All customer stories

How Bold Commerce is reimagining the definition of incident management with incident.io

With incident.io, Bold Commerce has expanded its use cases for an incident management platform—bringing security incidents and maintenance events into the fold

Key Benefits

  • A user-friendly UI that makes declaring incidents much easier
  • Powerful custom workflows that allow for ultimate control
  • An intuitive platform that can be used for more than just incidents
  • Consolidation that removes the need for disparate tooling
Just because it's called incident.io doesn't mean it's just for incidents. It's actually a workflow tool for us at this point.
Craig Kinloch-Melia
Craig Kinloch-Melia
Head of Technology

For Craig Kinloch-Melia and his team at Bold Commerce—incident.io is arguably the most flexible tool in their tech stack.

Not only is it their go-to for incident response, but it’s also leaned on for a wide spectrum of other work: product launches, security issues, Black Friday preparedness, maintenance events and more.

For them, the idea that incident management tools, like incident.io, are designed exclusively to “break in case of emergency” is a missed opportunity.

At Bold Commerce, incident management isn’t just reactive work for common and critical errors. It’s an opportunity to funnel tasks and fast-moving projects into a well-defined process that allows for better collaboration, learning, and outcomes across the board. incident.io powers this every step of the way and enables proactive work that helps prevent incidents entirely.

On board since the early days of incident.io

As the Head of Technology at Bold Commerce, Craig oversees some of the biggest technical —and most impactful—projects at the company. As the go-to checkout solution for global companies, Craig is one of the main leaders at Bold who is responsible for ensuring the product is ready and available at all times.

So, given the scope of Bold Commerce’s product, there’s a lot at stake, and incident response plays a big part in ensuring smooth sailing at all times.

As an early adopter of incident.io, Craig understood the importance of having a robust incident management and response process in place.

“We've actually been incident.io customers before it even existed. We used the open-source version of it for a long time,” says Craig in reference to the early version of incident.io built by CPO Chris Evans when he was at Monzo Bank.

“At first, we only used the open-source version for incidents. It was a great way of coordinating responses. But just before Black Friday, our director of software said, ‘Let's try to use this for Black Friday and Cyber Monday.' We tried it, and it was like, ‘Yeah, this is so much better than the old way we were managing things.’”

What started off as an experiment would very quickly balloon to cover a wide variety of use cases.

Workflows, improved

Prior to using this early version of incident.io, Craig and his team would coordinate response processes through Slack, but this was often done in random and disparate channels.

“We'd have people randomly Slack each other if there was an issue. We had a 911 channel that people would post into, and then that message would trigger an alert to the operations team. They’d then have to set up an incident channel even if it wasn’t an operational incident,” says Craig.

Moving to incident.io’s workflow was a big revelation for us. Now, anyone can create an incident. They don't have to ping anyone. They just create it, and it's there.

Once incident.io was further along in its development roadmap, one feature in particular piqued Craig's interest and helped everything else fall into place: Workflows.

“For us, with Workflows, it was like, ‘Okay, now we're getting more control. Now, we're getting nudges that can be customized. We now have different types of incidents we can develop.’ And as we looked at the development roadmap more broadly, we thought, ‘There could be other ways to utilize incident.io here,’” says Craig.

“And so all of those things added to the fact that other workflows can go through this. Just because it's called incident.io doesn't mean it's just for incidents. It's actually a workflow tool for us at this point.”

It may be unconventional, but it really works

Now, Craig and his team at Bold Commerce are using incident.io in a handful of creative ways that utilize the platform outside of core incident response.

For them, it’s simple: anything that needs a robust, tried-and-true process where folks need to communicate and work on something collaboratively is a prime candidate for incident.io.

Thankfully, a lot of very important work falls under this category. But none of this would be possible without a solution that was intuitive, seamless, and easy to use.

It's the intuitiveness that’s really essential to the platform being successful. The documentation and training we have on incident.io is much less than anything we have with other tooling because it's intuitive. If you can work Slack, you can probably work in incident.io.

Security incidents

For a platform like Bold Commerce, security incidents are ones that need to be nipped in the bud with extraordinary precision and speed. So when incidents like these strike, Craig and his team look to incident.io to help them through.

“If there's anything that’s a data issue, we have a default private incident that's generated, and then we use that channel to record everything about the incident. Sometimes everything in that private incident channel goes into a legal document as well,” says Craig.

“incident.io is doing all of the incident recording for us. I think some of the new AI features are actually putting in a lot of that information we know is important rather than us having to pull it all from Slack and put it into a document. We have a timeline that we can pull and be able to put that in along with the documentation summaries that we do.”

Tiger teams

Sometimes, at Bold Commerce, there are instances where something that isn't an incident needs to be responded to quickly. For moments like these, a Tiger Team comes together, and an incident channel gets created to manage the ask.

“It's not necessarily a break in something, but maybe a feature request has come in that needs a whole bunch of people to get together quickly. When it’s something that needs a fast response, we'll create an incident channel where people are coming in, brainstorming, and quickly coming up with the solution and getting it out there,” says Craig.

Black Friday & Cyber Monday

Two of the most stressful times of the year for e-commerce platforms like Bold Commerce are Black Friday and Cyber Monday. During these busy days, everything needs to go according to plan. Any significant disruptions can mean the loss of millions of dollars in revenue.

Needless to say, all hands are on deck to deal with incidents quickly, and C-suite folks are especially tuned in to ensure that anything that does come up is moving toward resolution.

“For us, this is where incident.io comes into its own. We have a channel where people go for Black Friday. This means that my co-founder, Eric, and I don't have to sit around and look at a channel all day saying, ‘ OK, has an incident come in?’ says Craig.

“We actually have set up a Workflow to say, if you're working on Black Friday and create an incident, just ping it straight over to PagerDuty and just page both of us.”

The result of this automation is peace of mind to know that if something has gone wrong, they’ll know about it right away without any latency.

It helps us be very responsive as a company now that we're remote, spread across the North American region. It helps us get the attention we need because there are a lot of Slack messages flying around and many things happening on a day-to-day basis. It just brings people together so much quicker.

Product launches

For engineering and product teams, pushing a new feature live is a huge moment. But during launch day, so many things can go wrong, so it’s important to nail it from start to finish and not get lost in the excitement.

“If we’re launching something big, we create an incident. It’s a single place of contact while we’re going live to be able to coordinate actions rather than just creating a one-off Slack channel. But the use case isn’t just about creating a channel for us,” says Craig.

“It’s about actually allowing us to be able to record and document a lot of the things that are going on as well to be able to put it into a post-mortem document. We also create follow-ups as well. So once we’re done with the launch, we can go into incident.io, look up a list of follow-ups, and say, “This one didn’t get completed. Is that deliberate?’ And just make sure we close the loop on everything.

Planned maintenance

Finally, while planned maintenance is a regular occurrence, it can be quite disruptive. And for folks who aren’t looped in, it can be hard to know and figure out whether or not an issue they’re having is related to ongoing maintenance. To be proactive about addressing some of these issues, Craig and his team create maintenance incidents to keep everyone informed about what’s going on.

“Normally, with planned maintenance, there's documentation ahead of time, but there’s a maintenance window that we've created. We create an incident for it that lets everyone know that this is what's happening,” says Craig.

“So then if anything weird happens, teams can look and say, ‘Yeah, there's maintenance going on.’ I can jump into that and then quickly see if there's anything related to this, and we can correlate very quickly any fallout from maintenance. It also keeps people informed along the way.”

boldcommerce
About the interviewee

Craig Kinloch-Melia is the Head of Technology at Bold Commerce, where he helps spearheads strategies to enhance merchant checkout experiences. With a 25-year background in technical leadership across defence to education, both in private and public sectors, Craig has driven the success of development teams and operations in high-impact projects. His passion lies in adopting emerging technologies for groundbreaking results.

Craig Kinloch-Melia

Craig Kinloch-Melia

Head of Technology

Industry
E-commerce
Customer since
2021
Company size
250+
Office model
Hybrid

You may also be interested in

Operational excellence starts here