Models

Claude's thinking_mode and reasoning_effort API Tags Confirmed Real

April 11, 2026 2 min read

Image: Anthropic

Claude's API accepts two underdocumented control tags - <thinking_mode> and <reasoning_effort> - that let developers adjust how much the model reasons before generating a response. A developer confirmed both tags work on enterprise Claude subscriptions, with API responses explicitly surfacing these parameters.

The tags do what the names suggest. <reasoning_effort> controls how much computational work Claude does before answering. Higher effort means slower, more thorough responses - better suited for complex multi-step problems. Lower effort means faster, cheaper responses - sufficient for simpler classification or formatting tasks. <thinking_mode> toggles extended thinking on or off. Extended thinking is Claude's internal step-by-step reasoning process, where the model works through a problem internally before writing its response - similar to showing your work before writing a final answer.

Neither tag is prominently documented in Anthropic's public API reference, which led to genuine disagreement in developer communities about whether they were real features or whether Claude was generating plausible-sounding but fictional parameters when asked about its own configuration.

What This Means for Developers

For teams building on Claude, these controls add flexibility over cost and response quality at the per-call level. A legal document analysis might warrant maximum reasoning effort. A batch job classifying short text snippets doesn't need it. Being able to set reasoning depth per request - without switching between entirely different model versions - simplifies how you architect applications that handle varied task types.

The confirmation also points toward Anthropic moving in the same direction as OpenAI with its o-series reasoning models: giving developers an explicit dial over inference behavior (how the model processes inputs and generates outputs) rather than baking fixed performance settings into each model tier. The pattern across both companies is converging - replace the binary "fast model vs. smart model" choice with per-call controls that let you match compute to task complexity.

What This Means for Developers

Related Tools

More from today

Gemma 4 31B vs Qwen 3.5 27B: Which Handles Long Documents Better?

Berkeley Researchers Show AI Agent Benchmarks Can Be Systematically Gamed

Claude's Quality Problem: Why Paying Users Are Losing Confidence

Cookie Preferences