It’s been almost 2 years since our last post on our data stack, and an update is long overdue! This post will give an overview of what our current setup looks like, and dive a bit more into some highlights like:
As in the last post on our data stack, we’ll start with a quick overview of how our data stack fits together. Our stack can be thought of being in 3 broad sections: Extract & load (EL), Transform (T), and Analyze.
Our Engineering team recently carried out a migration to Explo for our Insights product, which allows our customers to get a deeper understanding into their own data—like how much time is being spent on incidents, which users are being paged most, and how diligent teams are in completing post incident flows.
Explo as a tool is purpose-built for embedded analytics, and enables:
Earlier this year the Data team made the move to Omni analytics, built by some of the early team from Looker, for our internal BI tooling—and we’ve been very impressed with it so far!
We needed a tool that:
Over a ~4 week trial period, it was clear that Omni fit our use case nicely, striking a solid balance between governance and flexibility. It achieves this through its concept of model layers:
The last part is particularly useful - you can create a set of measures, joins, or custom columns that you want to reuse across a single dashboard without affecting the company-wide shared model. Should you want to write back to the shared model, you can cherry pick the changes to write back or just push everything.
There were a bunch of other features we really liked as well: the ability to write Excel formulas and translate it into SQL, custom markdown visualisations (example below), being able to embed just about anything into a dashboard (e.g. Gong calls), and how easy it was to embed dashboards into Salesforce to bring insights directly to our Commercial users.
Whilst Omni enables ad-hoc analysis across the company, there are times where we want to be able to step through multiple SQL transformations, or use Python in combination with SQL.
Hex does an excellent job of this—you can think of it as a supercharged Jupyter notebook that makes it really easy to do ad-hoc / deep-dive analyses. We use it extensively as a Data team and it’s a really valuable tool in our stack.
We onboarded our first Data Engineer a few months ago, and he’s since made several improvements to our setup that he outlines in a separate blog post.
But something to call out specifically that has been a huge quality of life improvement is how we run dbt locally.
Anyone familiar with dbt knows the pain of editing a model, then having to run everything upstream of it, before being able to test your changes. You can, of course, do a few workarounds:
manifest.json
file to use for state
, and still takes some time to carry out (plus in BigQuery you can’t set clones to expire)manifest.json
file, and moving it out of the target folderThe third option is the cleanest, but is a bit of a pain to do whenever you want to do any local dev work. Dry runs / limited data runs are still a good idea in general—but at our size of datasets, running individual models is relatively fast and we can use incremental builds in any outlier cases.
A really nice solution that our Data Engineer implemented was to upload the latest manifest.json
from the most recent production run of dbt (when a code change is deployed) to Google Cloud Storage, which we can then pull locally to use for --defer --state
.
The Python code to pull the latest manifest is wrapped in a bash script, and this allows us to have some really simple bash aliases to perform any type of --defer
based dbt command locally:
mfest
downloads the latest production manifest file, then runs dbt ls -s nothing
which does nothing apart from getting dbt to compile the manifest file for the current branch and is faster than dbt compile
. The other commands then either:
-s
, deferring to productionWe use our own product extensively at incident.io, and the Data team is no exception.
If the data pipeline fails, it’s an incident—and since you’re doing the work anyway to rectify the issue, you might as well track it!
Data incidents will typically fall into one of two categories:
Previously, we’d create an incident manually if we were notified by email, via CircleCI, that our data pipeline had a failure. This would capture the first type of incident above, but often the second type would be missed as the first step would be to hit retry on the failing job.
Now, using our On-call product that we launched earlier this year, we can:
@data-responder
Slack group so that anyone who needs help resolving an incident doesn’t need to know who’s on the rota—they just tag the groupQuite a lot has changed over the past couple of years, but the core structure of our stack and our dbt repo have scaled well: Fivetran still remains a very effective tool for syncing data to BigQuery (although this is not surprising as an earlier-stage B2B company), and we’ve not drastically changed how our dbt repo is set up.
But the tools and vendors available in the data space are always changing—and data stacks rarely stay static as a result. There’s a fine (and difficult) balance to strike between focusing on executing with the tools you have at your disposal already, and spending the time and money bringing new ones in.
We’ll continue to adapt our stack as new tools become available and hit new challenges as we scale—but hopefully this time it won’t be 2 years until we next write about it!
We developed The Fatigue Score to make sure our On-call responders’ efforts are visible. Here's how we did it, and how you can too.
What does "good" incident management look like? MTTx metrics track speed, but speed alone doesn’t mean success. So, we decided to analyze 100,000+ incident from companies of all sizes to identify a set of new benchmarks for every stage of the incident lifecycle.
Building a data-driven culture in a company is hard, but we've made it possible across incident.io with some unique tried and tested strategies.
Ready for modern incident management? Book a call with one our of our experts today.