You have probably noticed it: AI-written text is lousy with em dashes. A new comparison test across 27 models and six providers puts actual numbers behind the complaint, and the variation between models is more dramatic than you might expect.
Mike Chambers ran five identical conversational prompts (topics like learning instruments and remote work) through every model available on Amazon Bedrock, keeping temperature and parameters consistent. He measured the results in em dashes per 100 words.
The Worst Offenders and the Clean Models
Writer's Palmyra X5 topped the chart at 2.17 em dashes per 100 words - roughly one every other sentence. Anthropic's Claude Haiku, Sonnet, and Opus in the 4.5+ family landed between 1.0 and 1.3 per 100 words. Amazon's own Nova 2 Lite scored in the same range.
Then there are the models that never use em dashes at all. Every Llama model tested - eight variants including the new Llama 4 Maverick - produced exactly zero em dashes across 40 responses. Palmyra X4 also came in at zero.
An odd middle ground: Claude Opus 4.1 and Sonnet 4 avoided em dashes entirely but substituted five en dashes instead. Same instinct, slightly different punctuation.
Training Data, Not Language Modeling, Drives the Habit
The fact that some models obsessively use em dashes while others never produce a single one rules out the popular theory that this is just "how language models write." If it were baked into the architecture, you would see it everywhere.
Chambers points to three likely causes. First, training data composition: models trained heavily on published journalism, academic papers, and professional writing absorb the em dash habits of those sources. Second, RLHF (reinforcement learning from human feedback, the process where human raters score outputs to shape model behavior) creates a reward cycle. If raters perceive em-dash-heavy text as more polished or authoritative, the model learns to produce more of it. Third, there is no keyboard friction for an AI. Humans rarely type em dashes because most keyboards make it awkward. Models have no such barrier - every token is equally easy to generate.
The Llama result is particularly telling. Meta's training pipeline apparently either filtered differently or did not reward that particular stylistic marker, producing text that reads closer to how most people actually write.
What This Means for AI-Generated Content
For anyone using AI tools to draft content, this is practical knowledge. If your output keeps getting flagged as AI-written, em dash density is one of the patterns detectors look for. Switching models is a blunter fix than editing, but knowing which models have the habit and which do not saves you a round of cleanup.
It also highlights something broader: the writing "personality" of each model is not a fixed property of the technology. It is an artifact of curation choices made during training. The same architecture can produce wildly different stylistic output depending on what text it was trained on and what human raters rewarded. Your AI writing assistant's favorite punctuation mark says more about its training pipeline than about language itself.