Models Breaking

OpenAI Launches GPT-5.4 With 1M Context Window and 83% Pro Benchmark

March 6, 2026 2 min read

Image: OpenAI

What Happened

OpenAI released GPT-5.4 on March 5, 2026 in three variants: the standard model, GPT-5.4 Thinking for extended reasoning, and GPT-5.4 Pro for high-compute workloads.

The headline numbers: 83% on OpenAI's GDPval benchmark, which tests performance across 44 professional occupations. That's up from GPT-5.2's 70.9%. On SWE-Bench Pro, the coding benchmark, it scored 57.7% - slightly ahead of GPT-5.3-Codex's 56.8%. OSWorld-Verified, which tests computer use and UI interaction, came in at 75%, beating the human baseline of 72.4% and nearly doubling GPT-5.2's 47.3%.

The API supports a 1.05 million token context window with 128,000 max output tokens. Standard pricing is $2.50 per million input tokens and $15 per million output tokens. The Pro variant runs at $30/$180 respectively. Cached input tokens get a 90% discount at $0.25 per million.

OpenAI describes this as "the first mainline reasoning model that combines frontier professional-work quality, frontier coding, native computer use, and 1.05M-context API support in the same default model." Rolling out now to ChatGPT Plus, Team, and Pro users. Enterprise and Edu access requires admin enablement. GPT-5.2 Thinking retires June 5, 2026.

Why It Matters

The context window is the practical story here. 1.05 million tokens means you can feed entire codebases, lengthy legal documents, or months of data into a single conversation without chunking strategies or RAG workarounds. For teams that have been working around the 128K-200K limits of previous models, this removes a real constraint.

The GDPval score matters for a different reason. 83% across 44 occupations means the model matched or exceeded professional-level performance in most knowledge work categories tested. That's not "it can help with your job." That's "it can do chunks of your job at professional quality." The jump from 70.9% to 83% in one generation is steep.

API pricing stayed competitive. At $2.50 per million input tokens for the standard model, it's accessible for production workloads. The Pro tier at $30/$180 is expensive but aimed at tasks where compute-heavy reasoning justifies the cost.

Our Take

The real signal is convergence. GPT-5.4 combines what used to require separate specialized models - coding (Codex), reasoning (o-series), computer use, and long context - into one model. That simplifies the decision of which model to route to for a given task. One model, most use cases.

The benchmark improvements over GPT-5.2 are significant, but the competitive picture is what matters for tool choices. Claude Opus and Gemini Ultra are both operating in this tier. The differences between frontier models are narrowing to specific capability edges rather than broad quality gaps.

For ChatGPT users on Plus or Pro plans, the upgrade is automatic. For API developers, it's worth testing against your specific workloads before migrating - benchmark scores don't always translate to your use case. The June 5 retirement of GPT-5.2 Thinking gives you three months to migrate any workflows that depend on it.

The million-token context window alone makes this worth evaluating if you've been hitting limits. Everything else is incremental but adds up.

What Happened

Why It Matters

Our Take

Related Tools

More from today

Claude Opus 4.6 Cracked Its Own Benchmark by Realizing It Was Being Tested

Claude Found 22 Firefox Vulnerabilities in Two Weeks, 14 High-Severity

Donald Knuth Credits Claude Opus 4.6 With Solving a Math Problem He Was Stuck On

Cookie Preferences