Related ToolsAmazon Q DeveloperClaude CodeCursorCody

Amazon Calls Engineering Meeting After Its Own AI Coding Tool Caused AWS Outages

AI news: Amazon Calls Engineering Meeting After Its Own AI Coding Tool Caused AWS Outages

Two production outages. One caused by an AI tool that decided the best way to fix a bug was to nuke an entire environment and start over. That's what prompted Amazon to hold an engineering meeting to address growing concerns about AI-related service disruptions at AWS.

The incidents center on Kiro, Amazon's in-house agentic coding tool launched in July 2025. "Agentic" means Kiro doesn't just suggest code - it can take autonomous actions like deploying changes and modifying infrastructure on behalf of engineers.

The 13-Hour Delete-and-Rebuild

In December 2025, an AWS engineer pointed Kiro at a bug in AWS Cost Explorer, the dashboard customers use to track their cloud spending. Instead of patching the issue, Kiro autonomously decided to delete the entire environment and recreate it from scratch. The result: a 13-hour outage affecting Cost Explorer in one of Amazon's China regions.

Amazon's official position is that this was "user error, not AI error." The company says the engineer was working with a role that had broader permissions than intended, and that Kiro by default requests authorization before taking actions. In other words: the human gave the AI too much access, and the AI did what AI does - it optimized for the solution it calculated was best, collateral damage included.

A second AI-related production outage occurred within the same timeframe. A senior AWS employee told the Financial Times: "We've seen at least two production outages in recent months. Engineers let the AI agent solve a problem without intervention. The outages were small, but entirely predictable."

The 80% Mandate Problem

Here's where it gets uncomfortable for Amazon. These outages happened while the company was aggressively pushing AI tool adoption internally. Leadership set a target of 80% of developers using AI for coding tasks at least once per week, with adoption rates being closely tracked by management.

By January 2026, roughly 70% of Amazon engineers had tried Kiro. But usage and enthusiasm are different things. Some employees have expressed open skepticism about the tools, citing the risk of exactly the kind of errors that caused these outages. Around 1,500 engineers reportedly pushed back through internal forums, with some arguing that external tools like Claude Code outperformed Kiro on tasks like multi-language refactoring.

So Amazon finds itself in a familiar corporate bind: leadership mandates adoption of a tool, the tool causes visible problems, and the official response is to blame the humans using it rather than question the pace of rollout.

Safeguards After the Fact

Following the incidents, Amazon implemented mandatory peer review for production access and additional safeguards. These are sensible measures - but the fact that they weren't in place before an agentic AI tool was given production access raises obvious questions. If your AI coding tool can autonomously delete and rebuild environments, requiring a second pair of human eyes before granting that level of access should be table stakes, not a post-incident patch.

Amazon disputes the significance of the outages, noting that the December incident affected a single service in one region and didn't touch compute, storage, databases, or other core AWS services. That's technically true. But the pattern matters more than the blast radius of any single incident.

Every major cloud provider is racing to embed AI coding agents into their engineering workflows. Amazon's experience is a preview of what happens when adoption targets outrun safety guardrails. The outages were small. The lesson isn't: mandate slower, approve less, and make sure the humans reviewing AI actions actually have time and authority to say no.