Incident management for data teams

Picture of incident.ioincident.io

If you're on a data team, have you ever considered using an incident management tool to respond to pipeline issues? If the answer is no, then you might want to check out this episode.

Here, we chat with Jack, Data Analyst at incident.io, to better understand why data teams can—and should—look to incident management tools like incident.io to manage issues. We chat about:

  • Why there's a big push for data teams to adopt the practices of engineering teams—from tools to processes
  • Why incident management tools aren't just for engineers
  • How an incident management tool like incident.io can transform the way you respond to things like pipeline issues
  • ...and much more

Read Jack's blog post about incident management for data teams.

The transcript below has been generated using AI and does not match the audio. It is a highly summarized version of the conversation you'll hear.

Luis: You're a data analyst here at incident.io. Can you tell us more about your background, what interests you about data, and how you ended up here at incident.io? It's a broad question, but I think it's helpful to get some background.

Jack: I spent a couple of years consulting on various project work. My first move into a pure data role was at Monzo, where I worked in finance data. I spent a couple of years there and met two of the founders from Instant. When the opportunity at Instant came up, it was a chance to come in at the start, set up basics, and work across various departments. I enjoy the part between business and technical sides, simplifying complex technical information into actionable insights.

Luis: For someone unfamiliar with a data analyst's role, what does a typical day look like for you?

Jack: We spend a lot of time embedded with teams, working directly with product managers and VPs on the commercial side. Our work involves understanding questions people want to answer, ensuring we have the right information, and presenting it in actionable ways, whether through dashboards, Slack alerts, or write-ups in Notion.

Luis: You mentioned attending a dbt conference. What did you learn there?

Jack: dbt is an industry-standard tool for transforming data. The conference focuses on how to run dbt at scale, addressing issues like running multiple dbt projects concurrently and making metrics reusable. The recent emphasis has been on governance and scale, adapting to the tool's adoption by larger companies.

You wrote a blog post on incident management for data teams. What are the trends you see in data teams operating more like and using tools engineers typically use?

Jack: There's a trend towards data teams adopting practices from software engineering, with a focus on tools that help spot issues. However, there's a gap in processes for handling incidents and communicating when things go wrong. The blog post reflects on the need for processes to deal with issues proactively.

Luis: Reflecting on your time at Monzo, how would you manage a pipeline error in the absence of an incident management tool?

Jack: At Monzo, incident management felt more like an engineering channel. Communication was often inconsistent, and debugging and updates were done in threads, making it challenging to track and coordinate. The blog post was inspired by these experiences and the need for a well-defined process.

Luis: How did you decide to use incident.io and can you share a recent incident management experience?

Jack: Initially, we didn't plan on using an incident management tool. However, as we encountered repeated issues, we saw the need for better visibility and processes. incident.io helped us bring in the right people, streamline communication, and document our responses. It provided a structured and efficient way to manage incidents.

Luis: Have you measured the time saved by using an incident management tool, and how challenging was the onboarding process?

Jack: Quantifying time saved can be challenging, but the tool brings consistency and control to incident response. Automations and customization help streamline processes, making it easier for data teams to respond effectively. Onboarding was straightforward, and the tool's prompts and automations guide users through the process.

Luis: Do you find your team declaring more incidents with incident.io?

Jack: Yes, the team is more confident in declaring incidents with incident.io. It reduces key person dependencies, fosters collaboration, and provides a consistent experience for everyone. The tool incentivizes teams to declare incidents by offering better visibility into the time spent on issue resolution.

Luis: If someone decides to start using an incident management tool like incident.io, how will their daily life change?

Jack: Using an incident management tool brings consistency, control, and visibility. It reduces dependencies on key individuals, offers better control over incident response, and provides visibility into team activities. Overall, it enhances the efficiency and effectiveness of managing incidents for data teams.

Operational excellence starts here