At incident.io, we ship fast. We're talking multiple times a day, every day (yes, including Fridays). Once I merge a pull request (PR), my changes rocket their way into production without me lifting a finger. 💅 It's when we tackle larger projects that this becomes a bit more complicated.
We recently launched Announcement Rules, which let you configure which channels incident announcements are posted in depending on criteria you define. That piece of work took us a few weeks, split over multiple PRs, and we couldn't put a half-finished feature in front of customers.
It's not a question of putting everything in one PR; the larger a PR gets, the riskier it becomes, as it's more difficult for someone to review confidently and a lot more work to roll back if something goes wrong. We also don't want to work on separate branches for too long, as we need to make sure everything we're building works together seamlessly, and working on a branch for a long time will leave us with a nightmare of merge conflicts when we do finally merge the branch back in.
We need to be able to deploy tiny bits of a feature consistently, building up to the finished product, and be able to hide those bits from our customers until the feature is ready for its debut. And feature-flagging - the process of enabling and disabling features programatically - allows us to do exactly that.
In a feature-flagging system, we can control visibility of specific features (in our case, Announcement Rules) by switching them on or off.
The most common kind of flag is a boolean flag, which can either be enabled or disabled. In its simplest implementation we might have a boolean variable
announcementRulesEnabled which we initialise to
false in production, but set to
true in development so that only we can see it while we work on it. Our app code would check the value of this variable and only display the new feature if it was
true. Once the feature's ready for release, we can set the variable's value to
true in production, thereby enabling the flag.
You should build the systems that differentiate you from your competition, and buy the ones that don't
There are a few ways of approaching feature-flagging, including building your own system, or using a third-party one. Having used a few custom-built feature flag systems over the years, I find it's almost always better to buy than to build - I strongly believe you should build the systems that differentiate you from your competition, and buy the ones that don't.
Feature-flagging is not a differentiator for us, and having another system to maintain would end up costing us more in the long term - especially as you add more dimensions to the flags, such as percentage-based or attribute-based rollouts ("enable this for all users who have this app version"). That's why we opted to use LaunchDarkly, a feature management platform-as-a-service.
Now, I've had plenty of experience integrating third-party libraries into apps, and fighting with poor documentation and strange library behaviour. But we managed to get LaunchDarkly implemented across the backend and frontend in a single day. It completely surpassed my expectations, and has made the entire feature-flagging process totally seamless. (I promise they're not paying me to say this. It's so good.)
Helpfully, LaunchDarkly provide SDKs for loads of different platforms and languages, so we were able to wrap their Go SDK in our own
featureflags package which enables us to quickly and easily query a user's feature flags. We have three environments set up on the LaunchDarkly dashboard - development (local), staging and production - and when you create a flag in one, it automatically gets populated across all of them.
Using a wrapper for the SDK allows us to reduce repetition and keep the list of flag names in one place (no passing in typo-riddled strings, thanks). To make this even more robust, we plan to add a check in CircleCI to compare the list of flags we get back from LaunchDarkly with the constants in the
featureflags library, to make sure they actually exist in the dashboard.
When the backend starts up we initialise the LaunchDarkly client as a singleton. Our code can then pass in the currently authenticated user's user ID and organisation ID to the library to get back a scoped client which we can query directly for specific flags.
This wrapper allows us to inject a feature flag client, making it easy to inject a mock client for testing. Whenever you're building features behind a flag, it's really important to test both the enabled and disabled versions of the code, to make sure both paths work as expected. I can think of many times in the past where we've written extensive tests for a new feature, but not covered the case where it's turned off (and subsequently realised that path is completely broken, and we had no idea because we were all using our local environments with the flag enabled).
In our web UI, we use the LaunchDarkly React SDK. Once a user logs in, we initialise the LaunchDarkly client with that user's ID, organisation ID, and any other attributes we want to be able to feature-flag against. The LaunchDarkly SDK gives us a handy React hook to easily access a user's flags, so we can make sure who isn't feature-flagged won't see the Announcements link in the settings menu.
Implementing feature flagging has enabled us to continue moving quickly while still staying in control of what we put in front of customers. Most of what we do is on the organisation level rather than targeting individual users, so we're much more likely to turn features on for specific organisations. It means we can ask a subset of our customer organisations to beta test a feature, and get fast feedback before we launch to everyone.
Right now we're relying on boolean flags, but as the complexity of our organisation and customer base grows, so will the complexity of our flags. It's reassuring to know we won't need to make changes to our implementation to accommodate that. LaunchDarkly gives us the flexibility to add custom attributes for users when we initialise the client; in the future we could feature-flag against anything we wanted, and use string-based or even JSON-based flags if we needed to.
I suppose you could say we're looking forward to hiding even more new and exciting features from our customers... not least for the intensely satisfying experience of ceremonially flipping a switch to release something new!
Building safe-by-default tools in our Go web application
At incident.io, we're acutely aware that we handle incredibly sensitive data on behalf of our customers. Moving fast and breaking things is all well and good, but keeping our customer data safe isn't…
Lisa Karlin Curtis
Deploying to production in <5m with our hosted container builder
Fast build times are great, which is why we aim for less than 5m between merging a PR and getting it into production. Not only is waiting on builds a waste of developer time — and an annoying…
New Joiner: Katie Hewitt
Hi! I'm the newest member (and first non-engineer!) to join the incident.io team. I'm going to be working on all things Strategy and Ops, from getting the rails in place to keep us working effectively…