The people most trained to judge writing quality are struggling to catch AI-generated text. Literary editors - trained to identify voice inconsistency, tonal gaps, and hollow specificity - are increasingly failing to distinguish AI-written manuscripts from human ones. The gap between what editors are trained to catch and what AI now produces has narrowed enough to cause real problems for publishing.
What Editors Are Actually Looking For
Professional editors train themselves to spot the symptoms of careless or artificial writing: characters who act inconsistently, sentences that use a word without quite understanding it, the rhythmic flatness of text that sounds correct but feels like nobody lived it.
Early AI writing failed these checks obviously. GPT-3-era outputs had a recognizable signature - confident but vague, structurally coherent but weirdly unspecific, grammatically clean but rhythmically monotonous. Editors who read enough of it developed an instinct.
That instinct is no longer reliable. Models trained on hundreds of billions of words of human text have absorbed enough stylistic range to mimic almost any register. RLHF - a training method where human raters score outputs and the model learns to produce text those raters prefer - specifically optimizes for text that humans judge as high quality. The outputs have been explicitly trained to fool human judgment.
The Detection Tool Problem
Automated AI detectors haven't filled the gap. Tools like GPTZero and Turnitin's AI detection have documented false positive rates - flagging human-written text as AI-generated. A 2024 study found Turnitin incorrectly flagged real student work in roughly 4% of cases. Across thousands of submissions, that's a significant error rate for high-stakes decisions.
The result: editors can neither trust their reading instincts nor trust the tools. Both give unreliable signals.
The Uncomfortable Implication
Here's what this conversation often avoids: a lot of what editors are trained to catch is surface-level style. Consistent voice. Grammatical coherence. Avoiding stock phrases. These are trainable qualities, and AI has been trained on them.
The harder things - original observation, genuine specificity, the sentence that could only come from the person who lived it - are still difficult for AI to produce consistently. But they're also what separates competent writing from distinctive writing, and most submissions never clear that bar anyway.
If editors are failing to catch AI text, part of the reason is that the text is genuinely difficult to distinguish from average human writing. That's a statement about how good AI writing has gotten. It's also a statement about the baseline quality of most submissions.
Publishers are responding not by improving detection but by changing submission structures: known contributors over open calls, author relationships over cold manuscripts, thematic constraints that require specificity over generic pitches. Not because AI writing is impossible to catch, but because the volume problem is real and no detection tool is solving it.