Related ToolsClaude CodeCursorCodyContinueAmazon Q Developer

Ramp Engineers Cut Code Review Time from Hours to Minutes Using Codex

Ramp Engineers Cut Code Review Time from Hours to Minutes Using Codex
Image: OpenAI Blog

Code review is one of the most consistent bottlenecks in software engineering. A pull request sits idle waiting for a senior engineer's attention - sometimes hours, sometimes longer - before the author gets feedback meaningful enough to act on. Ramp, the corporate card and expense management platform, says they've largely solved this with Codex.

According to a case study published by OpenAI, Ramp engineers are now getting substantive code review feedback in minutes rather than hours using Codex with GPT-5.5. "Substantive" is the operative word - not just style flags or missing semicolons, but the kind of feedback that catches logic errors, edge cases, and architectural concerns that would normally require a senior engineer's eye.

Codex is OpenAI's AI system built specifically for software tasks. Unlike general-purpose models, it's designed to read code across multiple files, understand how components interact, and reason about what the code is actually supposed to do - not just what it literally says. GPT-5.5 is the underlying model driving it in Ramp's setup.

What This Actually Changes

The productivity pattern here is worth understanding carefully. Ramp isn't replacing human reviewers. They're changing when human reviewers engage. Junior engineers can iterate against Codex's feedback before the PR even hits a senior engineer's queue. Senior engineers use it to pre-screen PRs, so their attention goes to the decisions that actually require judgment - not the obvious bugs Codex would have caught anyway.

The math works out: if Codex handles the first pass and catches 60-70% of the issues, a senior engineer's review becomes faster and more focused. The bottleneck doesn't disappear, but it shrinks.

This mirrors what teams using Cursor, Claudee Code](/tools/claude-code/), and Cody are reporting for code generation - that the biggest gains come from integrating AI into specific workflow steps rather than using it as a smarter autocomplete. Review is a different step than writing, but the principle is the same.

Why Ramp Specifically Matters Here

Ramp processes billions of dollars in business spend. Their engineering team isn't a 10-person startup. A case study from a company operating at that scale - where code review failures have real financial consequences - is more credible than a benchmark or a demo.

For engineering managers evaluating AI coding tools, the Ramp example suggests the entry point isn't generation but review: lower stakes, faster feedback loop on whether the tool is actually useful, and an easier sell to engineers who are skeptical about AI writing their code but open to AI reviewing it first.