Article

Looking ahead to KubeCon Europe 2023

We’re eagerly awaiting the start of KubeCon Europe 2023 next week! Coming off the heels of our first conference appearance at SRECon, we’re excited to meet even more folks in the broader engineering community and chat all things incident response.

We learned a lot about making a proper splash at conferences and took away some actionable feedback from folks we spoke with during SRECon. So we’re looking forward to implementing some of those learnings in Amsterdam.

But while we’re counting the hours to KubeCon, we wanted to highlight some incident response-related talks we’re keen on attending.

One quick but important disclaimer: many of the talks at KubeCon focus on categorizing incidents as things like data breaches and hacks. While we’d absolutely agree that these are incidents, we also like to take a more expansive approach.

We’ve written extensively about our approach in several blog posts, but in short, here are a few examples of issues we’d also call incidents:

  • Not enough food delivery riders being on shift and ETA times spiking, as a result, is an operational and product incident.
  • Your largest customer threatening to churn unless you re-negotiate their contract is a customer success incident.
  • And on the more technical side: experiencing intermittent downtime from repeated crashes

With this approach, we believe businesses can more predictably resolve every day issues since they can follow a traditional response process of declaring, triaging, investigating, fixing, monitoring and closing incidents.

This eliminates the need for siloed and ad hoc resolutions. In the end, by reframing the way you think of incidents, you can resolve issues faster and turn incident response into a company-wide effort.

With that, let’s dive right in.

Automated Cloud-Native Incident Response with Kubernetes and Service Mesh - Matt Turner, Tetrate & Francesco Beltramini, Control Plane

Hosted by Matt Turner of Tetrate and Francesco Beltramini of Control Plane, this talk will provide an overview of incident response basics specifically for cloud-native technologies. In it, they’ll also showcase a response to a log4shell attack against a workload in a k8s cluster.

Anatomy of a Cloud Security Breach - 7 Deadly Sins - Maya Levine, Sysdig

There will be lots of talks about protecting yourself from security breaches at KubeCon, but this one, in particular, stands out. Hosted by Maya Levine of Sysdig, this talk will walk through 7 real-world examples of cloud breaches and share insights based on analysis from the Sysdig Threat Research Team.

Their insights into attack and response patterns, as well as other elements, will give attendees actionable advice to protect themselves better and respond to incidents in cloud environments.

Disaster Recovery: Bringing Back Production from Scratch in Under 1 Hour Using KOps, ArgoCD and Velero - Andre Jay Marcelo-Tanner, Ada Support

We know a thing or two about recovering from incidents, so we’re excited to check out this talk.

Hosted by Andre Jay Maercelo Tanner of Ada Support, this talk will highlight an actual operational incident caused by misconfiguration, when the only way to get things working again was to rebuild their entire cluster from scratch.

Based on the description, this sounds like it’s going to be a very insightful talk.

incident.io: taking the effort out of incident management

We understand that incident management tools on the market tend to be overly complex, hard to use, and time-consuming to onboard.

So we take a different approach here at incident.io.

We believe that incident management should work for you and make your response times better.

Here’s how we help companies meaningfully improve their incident management and response:

Making incidents more transparent

Here at incident.io, we believe in the power of transparency and visibility. We think that it's the secret ingredient for highly successful companies across the spectrum. This translates to our position on incident response as well.

By default, when you declare an incident within our product, a Slack channel is automatically created to serve as a single source of truth for your response. This eliminates the need for backchannels of communication and creates a more visible incident response process.

Within that channel, all response actions are available for everyone to see. So whether you designate a new incident lead, change a severity, or take any action, everyone can stay in the loop.

Introducing automation into the incident response flow

We believe in making incident management more straightforward and less complicated. To help, we’ve created automation to give businesses the tools to streamline their incident response. We call these Workflows.

They’re a mix of automations and triggers that help guide your response process, leading to quicker time to resolution and less downtime.

For example, if someone declares an incident of a certain severity, you can create an automation that alerts a specific group of people when that happens. You can also auto-generate prompts that appear when particular actions are taken.

Say someone closes out an incident; you can then create a prompt to remind them to complete a post-mortem within a day or two.

We feel that these automations make it much easier for all folks to declare incidents regardless of technical ability.

Prioritizing learning from your incidents

Building a more resilient product is hard if you can't learn from your incidents!

To make learning from your incidents easier, we've created an Insights dashboard that gives you visibility into several relevant metrics.

This includes breakdowns of your severity types, incident response times, how much time specific teams spend on incidents, and more.

With this insight, you can create action items to address any areas for improvement in your incident response. For example, adding more folks to your on-call rotation, updating your severity levels, and more.

We'll see you at KubeCon Europe!

We couldn't be more excited to see you all at KubeCon next week! If you're attending the conference, please stop by booth #P10 to say hello! We'll have plenty of swag, fun giveaways, activities, and more.

If you want to read up on what incident.io is all about, check out our conference page here. And if you're ready to dive in and learn even more, book a demo before the conference. We'll be more than happy to answer any questions you have.

See you in Amsterdam!

Picture of Luis Gonzalez
Luis Gonzalez
Content Marketing Manager

Operational excellence starts here