Research Notable

An AI-Written Paper Passed Peer Review. It Was Mediocre.

March 28, 2026 2 min read

Six, seven, six. Those are the peer review scores an AI-generated machine learning paper received before being accepted at a workshop affiliated with ICLR, one of the top conferences in AI research. The paper landed in the top 45% of submissions. It was also, by most expert accounts, mediocre.

The system behind it, called The AI Scientist, was built by Jeff Clune and colleagues at the University of British Columbia. It handles the entire research pipeline on its own: generating hypotheses, searching literature, designing experiments, writing code, analyzing results, drafting a full manuscript in LaTeX, and even running its own internal peer review before submission. The team submitted three AI-generated papers to the "I Can't Believe It's Not Better" (ICBINB) workshop at ICLR 2025, with full permission from conference organizers. One was accepted.

The whole process took about 15 hours and cost roughly $140. A typical graduate student needs an entire semester to produce their first accepted workshop paper.

What "Accepted" Actually Means Here

Before anyone panics about robots replacing scientists, some context on the bar that was cleared. The ICBINB workshop had a 70% acceptance rate. That is a very different filter than the main ICLR conference, which accepts around 32% of submissions. None of the AI Scientist's papers came close to main conference quality.

The accepted paper also had real problems. Reviewers and outside experts flagged hallucinated references (citations to papers that don't exist), duplicated figures, and weak methodological rigor. Maria Liakata noted the work lacked "real novelty." The system had creative sparks in its idea generation, but as Clune himself admitted, it "struggled with execution."

The paper was ultimately withdrawn per the team's protocol, since it was AI-generated. But the fact that it sailed through review at all says as much about the peer review system as it does about the AI.

The Real Story Is About Peer Review's Blind Spots

Peer review is supposed to be the quality filter for scientific knowledge. If an AI can produce a paper with fabricated citations and duplicated figures that three human reviewers still score favorably, that is a stress test the system failed.

Yanan Sui, a researcher who has studied the review process, warned that AI-generated papers "are probably going to make things much worse" for an already strained system. Major conferences have responded by prohibiting purely AI-authored submissions, but enforcement is a different question entirely. Detecting AI-written text is notoriously unreliable, especially in technical writing where the style is already formulaic.

The $140 price tag matters here. At that cost, a single bad actor could flood conferences with submissions. Even if most get rejected, the review burden falls on volunteer reviewers who are already overloaded.

This is less a story about AI getting smarter and more a story about a quality-control system that was already creaking under pressure. The AI Scientist didn't beat peer review. It found the cracks that were already there.

What "Accepted" Actually Means Here

The Real Story Is About Peer Review's Blind Spots

Related Tools

More from today

Snowflake Survey: 77% of Firms Report AI-Driven Hiring, But the Details Are Messy

Stanford: AI Chatbots Affirm Users 49% More Than Humans When Giving Advice

Give a Coding Agent Access to Research Papers and It Finds Tricks It Never Knew

Cookie Preferences