Nearly half of AI-generated code fails basic security tests. That stat, buried in a recent essay by Leonardo de Moura, should concern anyone who depends on software for, well, anything.
De Moura isn't some random critic. He spent twelve years building Lean, the proof assistant (a tool that lets you mathematically prove code does what it claims to do) now used by Google's AlphaProof, Amazon's Cedar authorization system, and Microsoft's cryptographic libraries. When he says AI-generated code has a verification problem, the people building AI coding tools are listening.
His argument is straightforward: AI is writing code faster than humans can review it, and the gap is only growing. Google and Microsoft report that 25-30% of their code is now AI-generated. Some projections put that at 95% by 2030. Anthropic recently built a 100,000-line C compiler in two weeks for under $20,000. The speed is real. The safety net is not.
Testing Catches Bugs. Math Catches All of Them.
Traditional code review and testing are probabilistic. You check for the problems you can think of. Formal verification (writing a mathematical proof that code behaves correctly under all conditions) is absolute. De Moura points to the Heartbleed vulnerability, a single bug in OpenSSL's cryptographic library that survived two years of expert code review and cost the industry hundreds of millions of dollars. A formal proof would have caught it before deployment.
The practical objection has always been cost. Writing formal proofs is painstaking, specialized work. But de Moura argues AI changes the economics completely. He cites a case where a single mathematician formalized the Prime Number Theorem (a major mathematical result) in three weeks using AI assistance. The same task previously took over a year of manual work. If AI can write proofs that fast, verification stops being a bottleneck and becomes just another step in the pipeline.
The Real-World Stack Already Exists
This isn't purely theoretical. Lean's math library, Mathlib, contains over 200,000 formalized theorems contributed by 750 people. AWS uses Lean-verified proofs in Cedar, its authorization engine. Microsoft verified its SymCrypt cryptographic library with it. A developer named Kim Morrison recently verified a complete zlib compression library using AI-assisted proof generation.
De Moura's proposal: rebuild critical infrastructure layers (cryptography, compilers, parsers, storage engines) with formal proofs baked in. Not as an afterthought audit, but as specification-first development where engineers define what the code should do mathematically, then let AI generate both the implementation and the proof that it works.
What This Means for AI Coding Tools
For anyone using Cursor, GitHub Copilot, Claude Code, or similar tools, the immediate takeaway is practical: the code these tools generate is fast but not automatically trustworthy. Poor software quality already costs the U.S. economy $2.41 trillion per year according to a 2022 study, and that was before AI coding assistants went mainstream.
De Moura's vision of specification-first development would flip the current workflow. Instead of writing code and hoping tests catch the problems, you would describe what the code needs to do in precise terms and let AI handle both the implementation and the mathematical proof of correctness. The engineer's job shifts from writing code to designing systems.
That future is probably years away for most development teams. But the underlying point stands right now: as AI writes more of our code, we need better ways to verify it than reading diffs and running test suites. The tools that figure out verification first will have a serious advantage over those that just optimize for speed.