We rolled out user roles a few weeks ago, and this post goes behind the scenes with how we:
We’ve been pretty lucky at incident.io to be able to avoid dealing with more complex authentication issues for quite a while, because we piggy-back on Slack to know who you are and which organization you work in. Whole companies have been built around doing authentication and user profiles really well, so it was pretty neat to be able to avoid doing most of that work for so long!
We’ve known we won’t be able to avoid tackling this problem for ever though. There are a few features on our roadmap that will require permission levels, but the launch of Private Incidents, made us realise that we couldn’t put this off any longer. Some people would need to be able to have oversight of even private incidents, but we couldn’t let everyone do that, or they wouldn’t really be private any more.
Before we begin, there’s some jargon in the world of permissions that will help here:
As anyone who’s ever built anything in Google Cloud or AWS will tell you, there are some incredibly powerful and flexible permission models around, which are essential for big, complex, and flexible cloud platforms, but would be a burden for us to maintain, and for our customers to use.
On the other hand, we don’t want the permission levels we decide when we’re a small company to become constraints we’re always hitting against a few years down the line.
Our middle ground is to have a fixed set of roles that we offer to all our customers, but which become a set of granular scopes behind the scenes. That means we only have to define things like “an owner can see private incidents” and “administrators can change billing info” in one place, but the billing flow just asks “can this person change billing info?”
We rolled this check into our existing getIdentity
function which gets the authenticated organization from the request context:
func getIdentity(ctx context.Context, requiredScopes ...rbac.Scope) (*domain.Organisation, *domain.User, error) {
...
// Check that all scopes are present on the user.
for _, requiredScope := range requiredScopes {
if err := rbac.Ensure(ctx, org, user, requiredScope); err != nil {
return nil, nil, err
}
}
return org, user, nil
}
That means adding permissions to checks to all our APIs was really simple:
func (svc *announcementRulesService) Create(ctx context.Context, payload *announcementrules.CreatePayload) (*announcementrules.CreateResult, error) {
- org, creator, err := getIdentity(ctx)
+ org, creator, err := getIdentity(ctx, rbac.ScopeAnnouncementRulesCreate)
We also needed to make the frontend app aware of these permissions, otherwise we’d waste people’s time letting them make a whole bunch of changes they aren’t allowed to save.
Our type-safe generated client made this much simpler! We include the list of possible scopes as an enum in our API design, which means the frontend can refer to scopes in a type-safe way. We return the user’s current list of scopes to the frontend as part of their identity, so checking if something should be available is relatively neat:
const hasScope = (scope: ScopeResponseBodyNameEnum): boolean => {
return identity?.scopes?.map((x) => x.name).includes(scope);
};
We can now have lovely safe components like a GatedButton
that can disable itself for anyone without the relevant scope, and tell you why you're locked out ⛔
This isn’t the kind of feature you want to just ship all in one go, because there’s so many ways it can go wrong:
We solved this by putting our users in control of when they want this feature. This is a new timestamp that we store for each organization which marks when we’ll start applying permissions for them. We set a deadline of March 1st for turning this on, but any organization owner could opt-in earlier than that.
Before that deadline, all our other features will work as before, but even owners won’t be able to see everyone’s private incidents.
That meant that getting our initial choice of owners and admins wrong was less risky: although a poorly-chosen owner could hit the “enable roles” button and see all private incidents, that would be an active bad choice from them. However, we didn’t want to make everyone an owner and let people lower privileges manually - that would be a huge amount of work for some of our larger customers!
We started by thinking about how we’d handle this for new signups. The only option there is to make the first person who installs incident.io the account owner, so we decided to treat our existing customers in the same way, and send the chosen owners a message asking them to make sure to re-assign it if we’d made a poor choice.
For administrators, we looked at product usage data, and identified anyone who’d done anything that would require the administrator role in future, and made them an administrator. Our message to the owner made it clear that they should review the administrators as well, and for most organizations this was pretty close to correct.
This meant we could build and test the new permissions system behind the scenes, and safely roll it out to our existing customers over a month.
As our product grows in complexity we’ll be able to extend this system, but for now it’s nice and simple for us to understand, and hopefully easy to use too.
We created a dedicated page for Anthropic to showcase our incident management platform, complete with a custom game called PagerTron, which we built using Claude Code. This project showcases how AI tools like Claude are revolutionizing marketing by enabling teams to focus on creative ways to reach potential customers.
We examine both companies' comparison pages and find some significant discrepancies between PagerDuty's claims and reality. Learn how our different origins shape our approaches to incident management.
The EU AI Act introduces new incident reporting rules for high-risk AI systems. This post breaks down what Article 73 actually mandates, why it's not as scary as it sounds, and how good incident management makes compliance a breeze.
Ready for modern incident management? Book a call with one our of our experts today.