First look: behind the scenes building an AI incident responder.
First look: behind the scenes building an AI incident responder.
December 24, 2024
Tired of waking up to incident names like: [CRITICAL] k8s-prod-cluster-east1: High pod eviction rate (>45%) detected in default namespace causing cascading OOM errors across statefulset replicas (oops-i-oomed-again-service-v2)
?
Would you prefer it said:
east1
production cluster.oops-i-oomed-again-service-v2
affected, with failures spreading across replicas.We've just rolled out AI-powered incident naming and summarization that automatically generates clear, descriptive names and summaries based on your alert data.
You can enable AI naming for specific incidents in your alert route while keeping manual naming for others, giving you the flexibility to customize based on your team's needs.
For example, you might want to:
Also, we don’t modify the name of the underlying alert - meaning that if you, like us, have gotten used to blearily reading alert-name-soup, you’ll still be able to make use of your pattern-recognition abilities!
When alerts come in from Grafana, the first step is often to check the dashboard that triggered the alert. Now, with upgrades to our Grafana integration, we’ll pull a screenshot of the dashboard and place it in the channel.
You can enable this by visiting Settings → Integrations → Grafana.