Companies Notable

Amazon Workers Are Gaming AI Metrics Through "Tokenmaxxing"

May 12, 2026 2 min read

Three months into Amazon's renewed AI adoption push, something unexpected started showing up in the data: employees were consuming enormous quantities of AI tokens without any corresponding increase in meaningful output. The behavior has a name now - tokenmaxxing - and it's spreading.

A token is the basic unit AI language models use to process text. Roughly speaking, 1,000 tokens equals about 750 words. Amazon tracks AI adoption in part by token volume - how much text its employees feed through tools like Amazon Q Developer or Claude via Bedrock. The idea: more tokens means more usage means more productivity. Employees figured out faster than expected that "more tokens" is easy to manufacture without doing anything useful.

What Tokenmaxxing Actually Looks Like

The behaviors reported by Ars Technica are varied but follow a pattern: paste in a large irrelevant document before asking a question, run the same query multiple times, generate content that never gets used, build internal automations that call AI APIs on a loop. Each tactic pushes usage numbers up without producing anything. The metric improves. Performance reviews improve. The actual workflow stays the same.

This is Goodhart's Law playing out in real time. Once a measure becomes a target, it loses its value as a measure. Token counts were supposed to proxy for genuine AI engagement. They now proxy for an employee's ability to manufacture token counts.

The Real Problem Is the Metric

Amazon has substantial reasons to want high internal AI adoption numbers. The company has invested $4 billion in Anthropic, runs AI infrastructure through AWS and Bedrock, and needs to demonstrate internally that its AI bets are translating to workforce productivity. That pressure flows down to managers, then to individual employees, then into behavior that looks like adoption and isn't.

The actual tools aren't the problem. Amazon Q Developer is a capable coding assistant. Claude handles documentation and analysis well. These tools produce real gains for specific tasks - but those gains are uneven, require the right workflow, and can't be mandated uniformly across job functions without producing exactly this kind of compliance theater.

Every company running an AI adoption campaign measured by usage volume rather than outcomes will see some version of this. The useful question isn't "how do we stop tokenmaxxing?" It's "what would a meaningful metric for AI adoption actually look like?" The answer is probably tied to specific task outcomes - time to complete a code review, reduction in document drafts before approval - not the volume of text an AI system processed in a quarter.

What Tokenmaxxing Actually Looks Like

The Real Problem Is the Metric

Related Tools

More from today

Sam Altman Testifies in Musk v. OpenAI: A Credible Stand, Uncertain Outcome

Meta Won't Let You Block Its AI Account on Threads

Altman Testifies Musk Considered Giving OpenAI to His Children

Cookie Preferences