Related ToolsClaudeClaude CodeAiderCursorCody

The Data on Claude and rsync Bugs: Not What the Discourse Claimed

Claude by Anthropic
Image: Anthropic

0.46. That's the p-value from a statistical analysis examining whether Claude-assisted commits made rsync buggier. In statistics, a p-value below 0.05 is the conventional threshold for a result worth taking seriously. 0.46 is nowhere near that.

rsync is a file synchronization utility that keeps files in sync between computers, servers, and backup systems. It runs silently on millions of Linux servers, developer machines, and CI pipelines every day, and it has been standard infrastructure since 1996. When rsync's maintainer began using Claude to help write code, online discussions surfaced quickly, claiming the AI was degrading the project's quality and introducing more bugs than before.

Researcher Alexis Purslane ran a statistical test on that claim. The methodology: analyze 46 rsync releases from v2.4.6 to v3.4.3, measuring bugs per 10 commits as the primary metric. Normalizing by commit count matters here - bigger releases have more code and naturally accumulate more bug reports, so raw bug counts would mislead.

Bug Rates Across 46 Releases

The two releases with Claude contributions came in at 0.80 bugs per 10 commits (v3.4.2) and 6.76 bugs per 10 commits (v3.4.3). The historical mean across all 46 releases was 7.59 bugs per 10 commits. Claude's average across both releases was 3.78 - below the baseline, not above it. Both Claude releases landed in the middle 50% of all historical releases.

A permutation test - which randomly shuffles the data to check whether any observed difference is just noise - returned a p-value of 46%, meaning random release pairs performed as badly nearly half the time. A Fisher's exact test returned 74%. A runs test checking whether releases were randomly distributed over time found nothing unusual in the sequence. None of these point toward a problem.

The Release Nobody Mentioned

The most striking data point in the whole analysis is v3.4.1: 113.33 bugs per 10 commits. That's the worst release in rsync's recorded history - roughly 15 times the historical mean. It had zero Claude commits. Nobody wrote posts about it. Nobody called for the maintainer to change their workflow.

That asymmetry is the real finding. When a release has problems and there's no AI involved, it's a bad release. When AI was involved, the same or lower defect rate gets read as evidence of inherent AI unreliability. Confirmation bias isn't just about finding patterns that support your beliefs - it's also about ignoring counter-evidence that would undercut them.

None of this proves Claude is good at writing rsync code. Two releases is a tiny sample, and achieving statistical significance at this scale would be nearly impossible regardless of actual quality differences. The analysis doesn't establish that AI-assisted contributions are safe - it establishes that the criticism wasn't grounded in data.

That distinction matters as AI-assisted contributions spread through more open-source projects. Quality concerns about AI-generated code are legitimate and worth rigorous investigation. But "I believe AI code is worse" and "the data shows AI code is worse" are different claims. In rsync's case, the evidence only supports the first one.