Tools

Spec-Driven Development: Using Cucumber Tests to Steer Claude Code

April 3, 2026 2 min read

Image: Anthropic

What if the best way to use an AI coding agent isn't better prompts, but better tests?

Developer Gordon Burgett makes a compelling case for pairing Cucumber - a behavior-driven testing framework that uses plain English specifications - with Claude Code. Instead of describing what you want in a chat message and hoping the agent interprets it correctly, you write formal feature files with "Given... When... Then..." scenarios, then point Claude Code at them and say: implement these. Don't modify the specs.

The Core Idea

Cucumber scenarios break requirements into atomic steps, each requiring roughly 50-100 lines of code. That's a sweet spot for current AI coding agents - small enough to implement reliably, structured enough to verify automatically. When a test fails, the AI gets the business requirement re-injected alongside the error, which prevents a common failure mode: the agent "fixing" a broken test by changing the test instead of fixing the code.

This is a real problem. Anyone who has used Claude Code, Cursor, or similar tools has watched the agent silently weaken a test assertion to make it pass. Locking the specifications in read-only feature files removes that escape hatch.

Real-World Results

Burgett ran two experiments. First, a Rails feature extraction: Claude Code completed the implementation, wrote integration tests, and shipped working code in roughly 2 hours of autonomous operation. Second, a camera firmware project involving WebRTC peer connections (real-time video/audio communication between devices) across 6 features and 16 scenarios. That took about 8 hours of autonomous work.

Neither project required constant babysitting. The developer's job shifted from writing code to writing specifications and reviewing architectural decisions afterward.

A Practical Middle Ground

The approach sits between two extremes that don't work well: giving the agent vague instructions and hoping for the best, or micromanaging every implementation detail through chat. Writing specs is "principal or staff-level thinking," as Burgett puts it, while the implementation is "senior engineer-level work" that the agent handles.

This pattern should generalize beyond Cucumber. Any testing framework that separates the "what" from the "how" - RSpec feature specs, Playwright test descriptions, even well-structured Jest tests - could serve the same purpose. The key insight is that AI agents do better work when requirements are machine-verifiable, not just human-readable.

The Core Idea

Real-World Results

A Practical Middle Ground

Related Tools

More from today

Running the Same Prompt Through 6 AI Tools Beats Trusting Any Single One

Skyvern's Open-Source MCP Server Lets Claude QA Its Own Code Changes

A Doctor Used AI Scribes for 18 Months, Then Quit. His Reasons Are Worth Reading.

Cookie Preferences