Research Notable

Andon Labs Put AI Models in Charge of Radio Stations. Here's What Happened.

May 15, 2026 2 min read

What happens when you hand a radio station to an AI and walk away? Andon Labs decided to find out.

The company has been running a series of experiments where AI agents operate entire businesses without human intervention. Its latest project puts four of the most widely-used AI models in the broadcast booth: Claude runs "Thinking Frequencies," ChatGPT runs "OpenAIR," Google's Gemini runs "Backlink Broadcast," and Grok anchors "Grok and Roll."

The concept is a stress test disguised as a stunt. Running a radio station requires ongoing judgment: what to play, what to say, how to respond when something goes wrong. There's no single correct answer to any of it - just a constant stream of decisions that need to feel coherent over time. That's exactly the kind of task that exposes the gaps in current AI systems.

The Problem With Unsupervised Decisions

AI agents today are genuinely capable of handling multi-step tasks, but they're not good at knowing when to stop, ask for help, or flag that something is off. Most failures in autonomous AI setups don't happen because the model does something wildly wrong. They happen because it confidently does the slightly wrong thing, over and over, with no one watching.

Radio is a particularly good test of this because the feedback loop is immediate and public. A human host notices when the room goes quiet or the phones stop ringing. An AI agent running unsupervised has no equivalent signal. It keeps going.

Andon Labs has been deliberate about making these experiments visible rather than polished. That matters. Most companies running AI agents in production have strong incentives to hide the weird edge cases. A public-facing radio station can't do that - every misstep happens live.

What This Actually Tests

The four stations set up a useful informal comparison. Claude, ChatGPT, Gemini, and Grok all have different design philosophies and different tendencies when given open-ended tasks with minimal constraints. Watching them manage the same job - pick content, fill time, stay on-brand - is a more honest benchmark than any controlled lab test.

None of these models are built to run a business alone. The interesting question isn't whether they fail; they will. The question is how they fail and whether those failure modes are predictable enough to work around.

For anyone building workflows that involve AI agents handling tasks without real-time human review, Andon Labs' radio experiment is a useful reminder: the bottleneck isn't usually getting the AI to do the task. It's building in the checkpoints that catch the quiet mistakes before they compound.

The Problem With Unsupervised Decisions

What This Actually Tests

Related Tools

More from today

Stanford Found a 31-Point Productivity Gap Between Agentic and Assisted AI

AI Is Flooding Academic Journals With Fake Citations, and Peer Review Can't Keep Up

YouTube Expands AI Deepfake Detection to All Adult Users

Cookie Preferences