The last time OpenAI announced a major mathematical breakthrough, independent researchers showed fairly quickly that the model had confabulated - generating plausible-looking but wrong mathematics with unwarranted confidence. That history makes the latest announcement worth paying attention to: the same skeptics are now saying the work holds up.
OpenAI claims its reasoning model has disproved a geometry conjecture that remained unsolved since 1946. The model produced a formal mathematical argument, and mathematicians who previously exposed OpenAI's false claim reviewed this one and backed it according to OpenAI's announcement.
A geometry conjecture from 1946 is not just an unsolved puzzle sitting in a drawer. It is a statement that professional mathematicians have tested, attacked from multiple angles, and failed to disprove across eight decades. Knocking one down requires constructing a counterexample or formal argument that survives scrutiny from people actively trying to find the flaw.
What Reasoning Models Actually Do
OpenAI's o-series models - the ones that power the more capable ChatGPT tiers - work differently from standard language models. Instead of predicting the most plausible next word based on patterns in training data, they are designed to work through a problem step by step before arriving at an answer, generating a chain of logical steps rather than jumping straight to a conclusion. That architecture is what makes them stronger at complex coding, contract analysis, and multi-step math compared to earlier models.
Whether that step-by-step process can handle genuinely novel, hard mathematics has been an open question. Most math problems that AI models appear to solve turn out to be close variants of problems already in their training data. Disproving an 80-year-old conjecture would be different in kind - it requires constructing something new, not recognizing a pattern.
The Verification Gap That Still Exists
The mathematician endorsements are the most significant detail here, not the claim itself. AI systems regularly produce confident-looking mathematics that falls apart under close inspection, and OpenAI's specific track record on math announcements adds a baseline level of caution.
Peer-reviewed publication with full methodology would close the loop. Until that exists, the claim sits at "credible but unconfirmed." The endorsements from skeptical outside researchers move it meaningfully in the right direction. If the work holds up through formal review, it suggests reasoning models are doing something beyond sophisticated pattern matching on math notation - which has real implications for any field where multi-step logical deduction is the core skill, from scientific research to legal analysis to complex financial modeling.