Research Notable

ETH Zurich Study: LLMs Can Identify Anonymous Users for $4 a Person

April 3, 2026 2 min read

Four dollars. That is roughly what it costs an LLM to figure out who you are from your anonymous internet posts.

A team of researchers from ETH Zurich and Anthropic published a study demonstrating that large language models can de-anonymize pseudonymous accounts across platforms like Hacker News and Reddit with startling accuracy. The results put hard numbers on something many privacy advocates have warned about in theory: your writing style, topic interests, and posting patterns are a fingerprint, and AI is very good at reading fingerprints.

The Numbers

The researchers built a four-stage pipeline called ESRC (Extract, Search, Reason, Calibrate) and tested it against 338 Hacker News users who had confirmed LinkedIn profiles. The system correctly identified 226 of them, a 67% success rate with 90% precision (meaning when it made a guess, it was right 9 out of 10 times). Only 25 identifications were wrong. The system abstained 86 times rather than guess.

For Reddit cross-community matching, the system hit up to 45% recall at 99% precision. Against anonymized interview transcripts, it identified 9 of 33 scientists at 82% precision.

The entire experiment cost less than $2,000. Individual identifications ran between $1 and $4 each.

450x Better Than Classical Methods

The LLM approach represents a 450-fold improvement over classical de-anonymization techniques, which typically rely on metadata analysis or stylometric tools (software that analyzes writing style patterns like sentence length, vocabulary choices, and punctuation habits). The difference is that LLMs can reason about content, context, and cross-platform behavioral patterns simultaneously, not just count word frequencies.

The research team included Simon Lermen from MATS Research, Daniel Paleka, Joshua Swanson, Michael Aerni, and Florian TramÃ¨r from ETH Zurich, plus Nicholas Carlini from Anthropic. The involvement of an Anthropic researcher is notable given that Anthropic builds one of the very models capable of performing this kind of analysis.

What This Means for You

The practical threat model here is not that someone will spend $4 to find your Reddit throwaway account. It is that this technique scales. A government targeting journalists, a corporation profiling employees, or a bad actor running social engineering campaigns can now automate identification at a cost that is essentially zero at organizational budgets.

Lermen's advice is blunt: "Could a team of smart investigators figure out who you are from your posts? If yes, LLM agents can likely do the same."

The defense options are limited. Changing your writing style manually is unreliable because the models pick up on dozens of subtle patterns you are not aware of. Using a separate LLM to rephrase your posts before publishing might help, but adds friction that most people will not maintain. The most reliable approach remains the simplest: share less identifying information in the first place.

This study was published in late February 2026 and the full paper is available on arXiv.

The Numbers

450x Better Than Classical Methods

What This Means for You

Related Tools

More from today

Claude AI Discovers Remote Code Execution Bugs in Vim and Emacs

Adafruit Tests Show LLMs Reproduce Open-Source Code Verbatim at High Rates

Karpathy's Alternative to RAG: Let LLMs Build Their Own Knowledge Wikis

Cookie Preferences