An AI text detector recently flagged Abraham Lincoln's 1863 Gettysburg Address as AI-generated content. The 272-word speech, written over 160 years before ChatGPT existed, apparently reads as suspicious to modern detection algorithms.
This is not a new problem, and that is exactly the point. AI detection tools work by analyzing statistical patterns in text - things like word predictability, sentence structure, and vocabulary distribution. The core issue: well-structured, formal writing tends to look "predictable" to these models, because clear prose follows patterns that overlap with how large language models generate text. Lincoln wrote with purpose and precision. So does GPT-4. The detector cannot tell the difference.
The false positive problem has been documented repeatedly. OpenAI shut down its own AI text classifier in July 2023 after it achieved only a 26% true positive rate while flagging 9% of human-written text as AI-generated. Independent studies have shown that non-native English speakers get flagged at significantly higher rates, because their writing sometimes follows more formulaic patterns.
Despite all this evidence, AI detectors remain embedded in university honor code processes and some hiring pipelines. Tools like Turnitin, GPTZero, and others market themselves with confidence scores that suggest precision their underlying technology cannot deliver. A tool that thinks Lincoln used ChatGPT should not be deciding whether a student gets expelled for academic dishonesty.
The deeper problem is that there is no reliable statistical fingerprint that separates human writing from AI writing. Watermarking (where AI providers embed hidden signals in generated text) shows more promise, but only works when every AI provider participates, and users can strip watermarks by paraphrasing. Until that changes, treating any detector output as evidence rather than a rough guess is a mistake with real consequences for real people.