6.3 million orders. That's how many transactions Amazon lost in a single day on March 5 when its North American marketplace saw a 99% drop in order volume. Three days earlier, on March 2, a separate incident knocked out 120,000 orders and generated 1.6 million website errors. At least one of these failures has been linked to code produced with the help of Amazon's own AI coding assistant, Q.
Amazon is now rolling out a 90-day safety reset across approximately 335 Tier-1 systems - the critical, customer-facing services that keep the shopping platform running.
What Went Wrong
The failures weren't caused by a single catastrophic bug. According to Dave Treadwell, Amazon's senior VP of e-commerce services, incident frequency has been climbing "since the third quarter of 2025." The root causes point to systemic gaps in how AI-generated code was being reviewed and deployed:
- Engineers could push configuration changes without automated validation
- Required two-engineer code approvals were being bypassed or skipped entirely
- "High blast radius changes" - modifications that touch many systems at once - were being deployed without proper safeguards
- Single operators could make changes to production systems without a second pair of eyes
The March 2 incident, which involved Amazon's Q coding assistant, is a concrete example of what happens when AI-generated code moves through a pipeline that lacks adequate guardrails. AI coding tools are fast, but speed without oversight is just a more efficient way to break things.
The 90-Day Fix
Amazon's response adds what Treadwell calls "controlled friction" to the deployment process. The new rules require at least two engineer reviews before any code hits production, enhanced documentation for changes, additional approval layers, and executive-level audits of production code changes. System owners must personally sign off on modifications to critical infrastructure.
Treadwell framed the challenge as needing to "unite AI-based agentic protection with more predictable, rules-based deterministic ones" - in plain terms, pairing AI-powered monitoring with old-fashioned human review processes.
A Warning for Every Team Using AI Code Assistants
This is the most significant real-world failure yet tied to AI coding tools in a production environment. Amazon isn't a startup moving fast and breaking things - it's a company that processes billions of dollars in transactions, and its own AI coding tool contributed to taking down its own marketplace.
The lesson here isn't that AI coding assistants are dangerous. It's that organizations are adopting them faster than they're updating their review and deployment processes to account for the volume and nature of AI-generated code. When a human writes code, they typically understand the system context and potential blast radius. When an AI assistant generates code, that contextual awareness may be missing, making human review more important, not less.
Amazon's 90-day reset is essentially an admission that the company skipped this step. Every engineering team using tools like Cursor, GitHub Copilot, Amazon Q, or Claude Code should be asking the same question: are our code review processes designed for the volume and speed of AI-assisted development, or are we running on the same guardrails we had before?