Research

The 'Delve' Problem: Why Making AI Sound Human Is Still Unsolved

March 20, 2026 3 min read

You can spot AI-written content in about three seconds. "Delve," "testament," "in today's fast-paced landscape" - these verbal tics have become so associated with large language models that they're practically watermarks. And despite billions of dollars flowing into AI writing tools, nobody has fully cracked the problem.

The challenge is getting renewed attention as more AI content tools hit the market. Builders working on AI writing products report that the hardest engineering problem isn't infrastructure or scaling - it's getting the models to stop sounding like a corporate press release.

The Techniques That Partially Work

The most promising approach right now combines three tactics:

RAG with personal writing samples: RAG (retrieval-augmented generation) feeds a user's past writing into the model as reference material, so the output mirrors their actual voice instead of defaulting to Generic AI Tone. Think of it as giving the AI a style guide written entirely in your own words.
Negative prompting: Explicitly telling the model NOT to use certain words and phrases. The "delve" ban list has become something of an industry standard at this point.
Few-shot examples: Showing the model 3-5 examples of good output before asking it to generate. This works better than describing what you want in the abstract.

Each technique helps. Combined, they get you maybe 70-80% of the way there. But that last 20% - the subtle cadence, the occasional sentence fragment, the willingness to be blunt - remains stubbornly difficult to replicate.

Why This Stays Hard

The core issue is that large language models are trained on enormous datasets that skew heavily toward formal, polished writing. Wikipedia entries, news articles, corporate blogs, academic papers - these sources share a certain register that the model internalizes as "correct." When you ask it to write differently, you're fighting against the statistical gravity of its entire training set.

There's also a measurement problem. How do you programmatically detect whether a piece of text sounds "too AI"? You can check for banned words, but robotic writing isn't just about individual words. It's about sentence structure, paragraph flow, the ratio of hedging to directness. Building automated quality checks for "voice" is closer to art criticism than software engineering.

Some teams are experimenting with classifier models trained specifically to detect AI-sounding patterns, then using those scores as feedback signals during generation. Others are taking a more manual approach: human reviewers scoring outputs and feeding those ratings back into the prompt engineering cycle.

What This Means for AI Writing Tools

For anyone using AI writing tools like Copy.ai, Anyword, Writesonic, or even raw ChatGPT and Claude, the practical takeaway is straightforward: the tool alone won't produce content that sounds like you wrote it. The gap between "acceptable first draft" and "sounds like a human" still requires editing.

The tools that will win this market are the ones that solve the voice problem first. Right now, most AI content tools compete on features and integrations. But the real differentiator will be which one can produce text your audience can't identify as machine-generated. That's a much harder bar to clear than adding another template or integration.

The Techniques That Partially Work

Why This Stays Hard

What This Means for AI Writing Tools

Related Tools

More from today

Study: AI Chatbots Cite Completely Different Sources Than Google Search

System Prompts Are Not Secrets: Why Your AI App's Instructions Are Exposed

Harvard Study: AI Cut Writing Time 75% but Couldn't Close the Expertise Gap

Cookie Preferences