Related ToolsChatgptClaude

178 AI Models Fingerprinted: 9 Clone Clusters Found at 90%+ Writing Similarity

AI news: 178 AI Models Fingerprinted: 9 Clone Clusters Found at 90%+ Writing Similarity

178 models. 3,095 standardized responses. 43 prompts designed to reveal how a model actually thinks and writes, not just whether it can answer. The team at Rival Tips analyzed the stylometric fingerprint (writing style signature) of nearly every significant AI model available, and the headline finding is this: 9 distinct "clone clusters" where models score above 90% similarity to each other - meaning they write in ways that are practically indistinguishable.

The methodology is solid. For each model, they extracted a 32-dimension fingerprint measuring lexical richness (vocabulary diversity), sentence structure patterns, punctuation habits, formatting choices, and discourse markers (phrases like "it's worth noting" or "in conclusion" that signal how a model structures its reasoning). Each dimension was normalized so that a model's tendency toward long sentences doesn't swamp its punctuation habits in the final comparison.

Nine Clusters of Models That Write Alike

Nine groups of models writing at over 90% cosine similarity - a standard measure of how close two data points are in multi-dimensional space - is a bigger result than it first appears. These aren't just models from the same company. The clustering cuts across providers, base model versions, and price tiers. When two models score above 90% on a composite of 32 independent stylometric signals, you're looking at outputs that users and experienced practitioners would struggle to tell apart.

The specific Mistral finding is the sharpest data point: Mistral Large 2 and Mistral Large 3 (released in 2025) score 84.8% on a composite metric combining five independent similarity signals. For a company that markets successive generations as meaningful upgrades, that's an uncomfortable number. The writing style - how the model formulates explanations, structures responses, uses language - didn't change substantially between releases even if benchmark performance shifted.

The Gemini-Claude Overlap

The most surprising individual finding: Gemini 2.5 Flash Lite writes 78% like Claude 3 Opus. These are models from different companies, built on different architectures (the internal structure of how the model processes and generates text). The similarity likely reflects shared training data - both ingested enormous quantities of the same internet text - combined with similar RLHF training approaches that push models toward agreeable, helpful responses. The result is convergent writing styles from technically different origins.

This has real implications for anyone using multiple models as a check on each other. If you're routing the same question to Gemini and Claude to get two independent perspectives, 78% stylometric similarity means you're getting less genuine independence than you'd assume.

What This Means When Choosing a Model

The research doesn't say all models are identical - benchmark performance, reasoning capability, and task-specific performance genuinely vary. But writing style is how model outputs feel in practice: readable or dense, structured or fluid, hedged or direct. If you're choosing between models based on how they communicate rather than raw task performance, the 9 clone clusters suggest you have fewer real choices than 178 available models implies.

For content work, customer-facing copy, or anything where voice matters: actually test writing style when selecting a model. Running the same prompt through five models and getting back five similar-feeling responses is data. The full dataset - 3,095 responses across 178 models - is available at rival.tips/research/model-similarity.