The pitch for AI agents has always had a cost argument at its center: automate knowledge work for a fraction of what a human costs. A Fortune report citing Microsoft data is making that calculation more complicated.
Microsoft has found that running AI in certain agentic configurations - where models execute multi-step tasks autonomously - can cost more than paying a human employee to do the same work.
Where the Token Costs Stack Up
The core issue is token burn. Every time an AI agent does something - reads a file, calls a tool, reasons through a problem, revises its own output - it processes tokens (the chunks of text that AI models read and generate, priced per thousand). Simple chat interactions might use 2,000 to 5,000 tokens. A multi-step agent that reads several documents, queries a database, drafts a response, and refines it can burn 50,000 to 200,000 tokens for work a human completes in 20 minutes.
At current API pricing, those numbers compound fast. A customer service rep handling 50 tickets per day at a $45,000 annual salary costs roughly $3.50 per ticket in labor. An AI agent burning 100,000 tokens per resolved ticket at $15 per million tokens costs $1.50 per ticket - but only if it resolves the ticket cleanly, without looping, retrying, or generating intermediate steps that never ship. Agents that stall or operate on long contexts (meaning they pass large documents into the model on every single step) can blow past any favorable comparison.
The Measurement Problem Nobody Planned For
Microsoft is an unusual messenger here. The company has invested over $13 billion in OpenAI and sells Copilot across its entire product line. Flagging cost problems with agents isn't a bearish signal - it signals that the industry is moving from "AI is inherently cheap" to "AI ROI depends on whether you measured the right things."
The practical guidance for businesses building or evaluating agentic workflows is fairly concrete: model your token costs before committing to an architecture. Caching repeated context (storing results so the agent doesn't re-read the same documents on every pass), limiting how many iterations an agent can take before stopping, and routing simple subtasks to smaller, cheaper models all cut costs without sacrificing meaningful output quality.
The comparison to human labor also depends entirely on what the agent is doing. High-volume, low-complexity work - extracting data from structured forms, categorizing identical-format emails - is where AI wins on cost decisively. Complex, judgment-heavy workflows requiring multiple rounds of reasoning are where that advantage shrinks or disappears. Microsoft's data is a useful correction to the assumption that one automatically beats the other.