Join us

Amazon Apologizes for Major AWS Outage in US-EAST-1 Region

Amazon Apologizes for Major AWS Outage in US-EAST-1 Region

TL;DR

Amazon apologized for the major AWS outage in the Northern Virginia region, caused by a race condition in the DynamoDB DNS management system, affecting services like DynamoDB, Network Load Balancer, and EC2.

Key Points

Highlight key points with color coding based on sentiment (positive, neutral, negative).

Amazon issued an apology to customers affected by the outage, demonstrating a commitment to transparency and accountability.

The outage impacted several AWS services, including Amazon DynamoDB, Network Load Balancer, and EC2 instances, causing significant operational challenges.

The primary cause of the disruption was identified as a latent race condition in the DynamoDB DNS management system.

AWS implemented several measures to recover from the outage, and by 3:53 PM PDT (Oct 20), all AWS services had returned to normal operations.

The sentiment expressed is one of responsibility and transparency, but the inconvenience caused by the outage might overshadow the apology for some users.

Key Numbers

Present key numerics and statistics in a minimalist format.
3

The number of distinct periods of impact to customer applications during the outage

141

The total number of AWS services impacted during the outage

3

The number of Availability Zones where the DNS Enactor operated redundantly

3

The number of DNS Enactors operating redundantly in the N. Virginia region

Stakeholder Relationships

An interactive diagram mapping entities directly or indirectly involved in this news. Drag nodes to rearrange them and see relationship details.

Organizations

Key entities and stakeholders, categorized for clarity: people, organizations, tools, events, regulatory bodies, and industries.
Amazon Web Services (AWS) Cloud Service Provider

AWS is responsible for managing the infrastructure and services affected by the outage, and they issued an apology and explanation regarding the incident.

Tools

Key entities and stakeholders, categorized for clarity: people, organizations, tools, events, regulatory bodies, and industries.
Amazon DynamoDB Database Service

DynamoDB experienced increased API error rates due to the DNS management system failure, affecting users.

Network Load Balancer Load Balancing Service

Users of the Network Load Balancer faced connection errors during the outage, impacting application performance.

EC2 Cloud Computing Service

EC2 instance users encountered issues with instance launches, affecting their computing resources.

Events

Key entities and stakeholders, categorized for clarity: people, organizations, tools, events, regulatory bodies, and industries.
AWS Outage in Northern Virginia (US-EAST-1) Service Disruption

A significant outage occurred affecting multiple AWS services, leading to Amazon issuing an apology.

Timeline of Events

Timeline of key events and milestones.
Oct 19, 2025 at 11:48 PM PDT AWS outage began

The outage started with increased API error rates in Amazon DynamoDB due to DNS resolution failures.

Oct 20, 2025 at 12:26 AM PDT Trigger identified

The cause of the outage was identified as DNS resolution issues for the regional DynamoDB service endpoints.

Oct 20, 2025 at 2:24 AM PDT DynamoDB DNS issue resolved

The DNS issue affecting DynamoDB was resolved, and services began recovering.

Oct 20, 2025 at 5:28 AM PDT Network Manager update

Network Manager began propagating updated network configurations to newly launched instances.

Oct 20, 2025 at 6:52 AM PDT NLB health check failures detected

Monitoring systems detected increased health check failures in the Network Load Balancer (NLB) service.

Oct 20, 2025 at 9:36 AM PDT NLB connection errors resolved

Engineers disabled automatic health check failovers for NLB, resolving increased connection errors.

Oct 20, 2025 at 9:38 AM PDT NLB health checks recovered

Network Load Balancer health checks were recovered.

Oct 20, 2025 at 1:50 PM PDT Full recovery of EC2 APIs

Full recovery of EC2 APIs and new EC2 instance launches was achieved.

Oct 20, 2025 at 2:09 PM PDT DNS health check failover re-enabled

Automatic DNS health check failover was re-enabled for NLB.

Oct 20, 2025 at 3:01 PM PDT AWS services returned to normal

All AWS services returned to normal operations.

Oct 20, 2025 at 3:53 PM PDT Final update on AWS outage

The final update confirmed the resolution of increased error rates and latencies for AWS Services in the US-EAST-1 Region.

Amazon's been in the spotlight lately, and not for the reasons they'd like. They had to issue an apology after a significant AWS outage rocked the Northern Virginia (US-EAST-1) Region. This wasn't just a minor blip on the radar. Between October 19 and 20, 2025, users faced increased API error rates in Amazon DynamoDB, connection errors with the Network Load Balancer (NLB), and failed EC2 instance launches. The root of all this chaos? A latent race condition in the DynamoDB DNS management system, which led to endpoint resolution failures. In simpler terms, a technical glitch that caused quite a few headaches.

The trouble kicked off at 11:48 PM PDT on October 19, when users started noticing those pesky API error rates in DynamoDB. Things went downhill fast, with connection errors in NLB and issues launching EC2 instances. By 2:25 AM PDT on October 20, Amazon had managed to restore DNS information and was deep into recovery efforts. But, as these things often go, the problems didn't just disappear overnight. EC2 instance launches and NLB health checks continued to struggle, causing more network connectivity woes.

Amazon's team was in overdrive throughout the day, trying to fix the mess. They throttled operations and applied various fixes, a bit like trying to patch a leaky boat while still at sea. By 3:01 PM PDT, most AWS services were back to normal, though some were still dealing with backlogs. Amazon's promised to learn from this incident and make changes to prevent it from happening again. It's a classic case of learning the hard way, but hopefully, it means smoother sailing in the future.

Enjoyed it?

Get weekly updates delivered straight to your inbox, it only takes 3 seconds!

Subscribe to our weekly newsletter DevOpsLinks to receive similar updates for free!

What is FAUN.news()?

Let's keep in touch!

Stay updated with my latest posts and news. I share insights, updates, and exclusive content.

Unsubscribe anytime. By subscribing, you share your email with @devopslinks and accept our Terms & Privacy.

Give a Pawfive to this post!


Only registered users can post comments. Please, login or signup.

Start writing about what excites you in tech — connect with developers, grow your voice, and get rewarded.

Join other developers and claim your FAUN.dev() account now!

FAUN.dev()
FAUN.dev()

FAUN.dev() is a developer-first platform built with a simple goal: help engineers stay sharp without wasting their time.

Avatar

DevOpsLinks #DevOps

FAUN.dev()

@devopslinks
DevOps Weekly Newsletter, DevOpsLinks. Curated DevOps news, tutorials, tools and more!
Developer Influence
1

Influence

1

Total Hits

27

Posts