What Happened
OpenAI published research on March 5, 2026 introducing CoT-Control, an open-source evaluation suite designed to answer a specific question: can reasoning models deliberately control or hide what's happening in their chain of thought?
The suite includes over 13,000 tasks built from established benchmarks - GPQA, MMLU-Pro, HLE, BFCL, and SWE-Bench Verified. Each task pairs a benchmark problem with a chain-of-thought instruction: avoid using certain problem-relevant keywords, write only in lowercase, append a specific word to each sentence. Simple things a model should be able to do if it has fine-grained control over its reasoning process.
The results: frontier reasoning models are bad at this. CoT controllability scores across all models tested ranged from 0.1% to a maximum of 15.4%. No current frontier model could reliably follow instructions about how to structure its own thinking, even when explicitly told it was being monitored.
Three additional findings stood out. Larger models showed slightly higher controllability than smaller ones - but still low. Controllability decreased as models were asked to reason for longer periods. And additional post-training (the kind of fine-tuning that makes models more capable) actually reduced controllability further.
Why It Matters
This matters because chain-of-thought monitoring is one of the primary tools safety teams use to detect when a model might be behaving deceptively. If a model could control its chain of thought, it could theoretically "think" one thing while showing monitors something else - reasoning about harmful actions while its visible thought process looks benign.
The finding that controllability maxes out at 15.4% is reassuring. It means current reasoning models are essentially transparent in their thinking, even when you try to make them hide it. The reasoning traces you see in ChatGPT, Claude, and other tools that expose chain of thought are, for now, a reliable window into what the model is actually doing.
For people building AI-powered workflows, this has a practical implication: chain-of-thought logs remain trustworthy as a debugging and auditing tool. When your agent produces an unexpected result and you check its reasoning trace, that trace is very likely showing you the real reasoning path.
Our Take
This is the kind of safety research that actually matters. Not a theoretical framework or a position paper, but an open-source benchmark that produces measurable numbers. CoT-Control gives the field a concrete way to track whether this property holds as models get more capable.
The concerning trend to watch is that controllability, while low, is higher in larger models. If that trend continues as models scale further, there's a future where a sufficiently capable model could learn to manipulate its own reasoning traces. OpenAI acknowledges this, which is why they're publishing the benchmark now - so the community can track it.
The fact that post-training reduces controllability is actually the most interesting finding. It suggests that the reinforcement learning that makes models better at tasks also makes their thinking more "honest" - they become more dependent on their genuine reasoning chain rather than being able to decoratively rearrange it.
For now, if you're relying on chain-of-thought visibility for oversight of AI agents in production, this research says you're on solid ground. That won't necessarily stay true forever, and having quantitative tracking of how it changes is exactly what responsible deployment looks like.