AWS re:Invent 2025: The top sessions SREs should attend

November 20, 2025 — 8 min read

If you’re heading to AWS re:Invent this year, you already know the struggle: hundreds of sessions, all sounding useful, all happening at the same time, and only one of you to go around. Whether you’re deep in SRE land, running incident response, or just trying to build more resilient systems, finding the right sessions can feel like a full-time job.

That’s why we pulled together a curated list of the must-see talks for anyone who cares about reliability, on-call life, cloud resilience, or keeping production from melting down at 2 a.m. These AWS re:Invent sessions are packed with real-world architecture lessons, resilience best practices, and incident-ready tactics you can bring back to your team.

If you want to make the absolute most of your week in Vegas, (without scrolling the catalog until your eyes blur), start here.

We’ll be there!

Come see our team at Booth #362. We’ll be giving away swag like our famous socks, flame plushies, tote bags, and more. Plus, we’ll be raffling off a pair of Airpod Max. 🔥

On Tuesday, December 2, join us after the conference for a special happy hour at the F1 Arcade. We're taking over the world's largest F1 Arcade for a night of racing simulators, premium food and drink, and mingling with others in the field.

You can register for the happy hour, book a meeting with a rep on-site, and see more details about incident.io x AWS here.


Know before you go: Session types

  • Breakout sessions are 60-minute lectures by AWS experts, with most sessions available on-demand after the event.
  • Chalk talks combine a 10–15 minute lecture with an interactive, 45–50 minute Q&A where experts and attendees solve problems together using a digital whiteboard.
  • Workshops are two-hour hands-on learning experiences where small teams collaborate to solve challenges using AWS services, guided by a short introductory lecture.
  • Builders’ sessions are 60-minute interactive, hands-on sessions led by an AWS expert, focused on building directly in AWS and encouraging live Q&A.
  • Lightning talks are fast-paced, 20-minute stage demos showcasing key ideas or solutions.
  • Code talks are 60-minute sessions featuring live coding demonstrations, with opportunities for attendees to engage deeply and ask questions about the speaker’s approach.

Our advice: How to pick & prepare

  • Prioritize sessions where you’ll be able to ask questions around failure modes, recovery tooling, observability & monitoring, and incident response culture/practice.
  • Block time after each session to capture how the learnings map to your own environment (what you’d change in your incident process or tooling).
  • Because SRE/incident roles often straddle teams, use these sessions to build cross-team discussions (for example: “How could we adopt fault isolation boundaries?”, “What resilience testing must we add to our release pipeline?”).
  • Reserve seats early - many of these are popular for resilience & operations.
  • Bring a notebook and pen! This one might seem pretty basic, but chances are you won’t remember everything you just heard - take the chance to write down a few key points that stick out to you.

Building on AWS resilience: Innovations for critical success

Breakout session

Why you should attend:

If you want to understand how AWS designs for extreme resilience at a global scale, this session is for you. It digs into the patterns, failure assumptions, and architectural guardrails AWS uses when building their most critical services. For incident response managers, this is a rare behind-the-curtain look at how to design for failure - not just react to it.

What you’ll learn:

You’ll walk away with practical strategies for adopting AWS-style resilience patterns in your own systems. That includes how to build for availability, reduce fragility, and think about failure domains the way AWS does. It’s directly applicable to improving your incident preparedness and upstream architecture.

Speakers: Amazon Web Services, Inc.

Time commitment: 1 hour


Resilience testing and AWS Lambda actions under the hood

Breakout session

Why you should attend:

Serverless introduces a whole new set of operational challenges - cold starts, distributed execution, and limited visibility. This session explores how AWS tests Lambda’s resilience using chaos engineering and fault-injection techniques that uncover real-world failure scenarios before customers ever hit them.

What you’ll learn:

Expect to learn how to apply resilience testing to your own serverless or event-driven architectures, including how to validate assumptions, uncover hidden failure modes, and reduce surprises during on-call. You’ll leave with concrete ideas to bring back to your incident response or SRE team.

Speakers: Amazon Web Services, Inc.

Time commitment: 1 hour


Defend against downtime using fault isolation boundaries

Breakout session

Why you should attend:

Every SRE knows that clear failure domains are one of the strongest tools you have in reducing downtime. This session breaks down how fault isolation boundaries - across AZs, Regions, and workloads - can dramatically limit blast radius and speed recovery when something does go wrong.

What you’ll learn:

You’ll gain an understanding of how to apply AWS’s fault isolation patterns in your own architecture and how tools like AWS Application Recovery Controller (ARC) can support faster recovery. For incident managers, this is a blueprint for fewer unknowns and smoother response during high-severity events.

Speakers: Amazon Web Services, Inc.

Time commitment: 1 hour


Cloud operations: Manage, operate, govern your AWS, hybrid & multicloud environment

Topic track

Why you should attend:

If your responsibilities span across ops, governance, on-call, observability, and reliability, the Cloud Operations track is your home base. This track is built around improving productivity, reducing complexity, managing multi-cloud/hybrid systems, and building for operational resilience.

What you’ll learn:

Sessions in this track cover everything from day-2 operations and monitoring strategies to large-scale governance and cross-environment incident management. It’s ideal for SREs, incident responders, and platform teams looking to uplevel operational maturity.

Speakers: Various AWS speakers

Time commitment: Varies by session


From ideas to impact: Architecting with cloud best practices

Breakout session

Why you should attend:

The AWS Well-Architected Framework has shaped cloud best practices for a decade - and this session walks through its evolution, lessons learned, and the architecture patterns that matter most today. It’s a great pick if you want to align your incident tooling and operational practices with proven architectural foundations.

What you’ll learn:

You’ll learn how to build systems that are secure, reliable, and operationally sound by grounding decisions in established AWS principles. For SRE and IR teams, it ties architecture directly to the workflows and playbooks you rely on during incidents.

Speakers: Amazon Web Services, Inc.

Time commitment: 1 hour


Conclusion

As always, AWS re:Invent is packed with more content than any one human can realistically absorb, but the sessions above give you a solid roadmap - especially if you live and breathe reliability, incident response, or the never-ending quest for “just a little more resilience.” Whether you're itching to dive into multi-Region failover patterns, sharpen your on-call muscle, or steal some hard-earned lessons from teams operating at massive scale, these talks are absolutely worth your time.

At the end of the day, the whole point of re:Invent is to come home with ideas you can actually use. So bookmark the sessions that speak to you, block out some mental space to reflect after each one, and get ready to bring back insights your team will thank you for during the next 2 a.m. surprise.

We hope to see you there!

Picture of Kate Bernacchi-Sass
Kate Bernacchi-Sass
Demand Generation Manager
View more

See related articles

View all

So good, you’ll break things on purpose

Ready for modern incident management? Book a call with one of our experts today.

Signup image

We’d love to talk to you about

  • All-in-one incident management
  • Our unmatched speed of deployment
  • Why we’re loved by users and easily adopted
  • How we work for the whole organization