With incident.io, Altis has implemented a scalable, repeatable model for incident management that can grow easily with the organization.
Altis is an enterprise-level hosting service, providing services for companies such as Snopes, Red Bull Media House, and Yell. Altis is part of Human Made, a fully remote organization with about 80 people all around the world.
As an enterprise host, it is essential for Altis to provide incident management support to customers on the platform. However, before adopting incident.io, Altis had a confusing and unclear process for managing customer-reported incidents. Incidents would be run through a pre-existing, central Slack channel. Because this channel was used for multiple incidents at any one time, communications were easily lost in the mix of other messages.
Overall, there was a lack of focus, no clear lead for any given incident and a generally haphazard approach. Collaboration and communication in particular suffered as a result. Without a clear structure, the team found that there was a lot of confusion, with efforts sometimes being duplicated or important actions being missed.
We all knew that wasn't a great system, but we didn't see an easy way to replace that.
incident.io has helped Altis to establish a scalable, repeatable model for incident management that can grow easily to support an increasing number of customers. In the past, there was a lack of confidence in how best to manage some aspects of the incident process. incident.io has provided a best practice framework and enabled the team at Altis to codify this, so that there is consistency every time.
I think in some cases we were like, "I don't know what we should do here," and incident.io has helped us shape that.
By bringing structure, providing a dedicated space for each incident and automating processes, incident.io has helped the engineers at Altis to deal with issues much more efficiently. For example, instead of having to ask for updates and offer assistance, incident.io streamlines this flow, by clearly assigning roles and actions, and providing timely nudges to keep the incident running smoothly.
The automated reminders and things like that are really handy... because it's automated, it doesn't feel stressful.
The web dashboard has enabled Altis to get a much clearer picture of the incidents that are occurring over time. Previously, the team would have to review PagerDuty triggers to work out how many incidents had taken place, but often this meant sifting through a large number of alerts that were not “real” incidents. It’s now really easy to get a snapshot of real incidents in any given month, and make the improvements needed to downgrade or eliminate them in the future. Access to this data, alongside timelines and postmortems, helps Altis to provide evidence as part of customer and compliance audits.
Now it's much, much easier for us to go back and say, "Well, how many actual incidents did we have this month versus alerts? What are we doing to improve upon this to either eliminate those entirely or to downgrade them to just alerts.