1,300 pull requests per week with zero human-written code. That's Stripe's headline number for its internal AI system called "Minions." It sounds massive. The math tells a different story.
Stripe employs roughly 3,500 engineers. Divide 1,300 PRs across that headcount and you get 0.37 AI-generated PRs per engineer per week. One AI pull request every two to three weeks per person. Suddenly the number feels a lot more modest.
What's Actually in These PRs?
Stripe hasn't said. That silence is the most interesting part of the entire claim.
The most likely candidates: test fixes, dependency updates, formatting changes, boilerplate generation, and migration scripts across microservices. These are exactly the kinds of mechanical, repetitive changes that AI handles well. They're also the kinds of changes that don't move the product forward in any way users would notice.
The company processes trillions of dollars in transactions but won't clarify whether Minions touches payment code. That strategic ambiguity suggests the system handles mechanical changes in payment-adjacent directories while core payment logic stays human-written.
There's also an inflation question. Stripe's initial claim said "over 1,000" AI PRs per week. Ten days later, that became "over 1,300" - a 30% jump with no explanation of whether actual adoption grew or the definition of what counts expanded.
The Review Bottleneck Nobody Talks About
Every one of those 1,300 PRs needs a human reviewer. Research from CodeRabbit found that AI-generated code contains 1.75x more logic errors and 2.74x more security vulnerabilities than human-written code. Flooding the review queue with AI-generated changes creates what's known as the "rubber stamp effect" - reviewers start cutting corners when volume overwhelms their capacity to scrutinize each change.
For a payments company, that's not a theoretical risk.
The Broader Pattern
Stripe's numbers fit a pattern emerging across the industry where AI PR counts go up but product quality stays flat or declines. Google's own DORA research report from 2024 found that AI adoption correlated with a 1.5% throughput decline and 7.2% stability decline. GitClear's analysis of 211 million lines of code showed code refactoring (a measure of code health) collapsed from 25% to under 10% as AI adoption rose. Faros AI studied 10,000 developers and found that despite a 98% increase in PRs merged, there was zero improvement in organizational delivery.
Meanwhile, Stripe's actual users still report years-long pain points: confusing API migrations, complex webhook ordering across 258+ event types, and degraded dashboard UX. A community guide titled "How to Implement Stripe and Stay Sane" has 6,200 GitHub stars - not exactly a sign of a product getting better.
The uncomfortable conclusion: PR count is a vanity metric. Stripe's Minions system is probably doing genuinely useful mechanical work. But presenting that work as a headline productivity achievement conflates volume with value. The real question isn't how many PRs AI can generate. It's whether the product improves faster because of them. On that measure, the evidence isn't there yet.