Related ToolsClaude CodeCursorCodyAiderChatgpt

Mozilla's Star Chamber Sends Code Reviews to Multiple LLMs at Once

AI news: Mozilla's Star Chamber Sends Code Reviews to Multiple LLMs at Once

What happens when you ask three different AI models to review the same code and only keep the findings they agree on? That's the premise behind Star Chamber, a new open-source tool from Mozilla.ai that sends code reviews to multiple LLM providers simultaneously and synthesizes the results into consensus findings.

How It Works

Star Chamber connects to Claude, GPT-4o/GPT-5.2, Gemini, Mistral, and Llama through Mozilla's open-source any-llm library (Apache-licensed). It runs in two modes:

  • Parallel mode: All providers review independently in a single round. Fast, no cross-talk.
  • Debate mode: Multiple rounds where an anonymous synthesis of all findings gets shared back to each provider. No model knows which provider said what (a "Chatham House rule" approach). Models can revise their positions based on the group's feedback.

Findings get classified by consensus strength: "all providers agreed," "majority (2+) agreed," or "individual observation." Each issue gets tagged with severity (high/medium/low), file location, and category (correctness, architecture, maintainability, craftsmanship).

The Self-Review Test

Mozilla ran Star Chamber against its own source code with three providers. The results: 3 issues where all providers agreed (including a security risk around API key masking), 2 majority issues, and several individual observations. The debate mode specifically surfaced convergence detection gaps that parallel mode missed.

The tool works as a standalone CLI (uvx star-chamber review), a Python library, or a Claude Code skill invoked with /star-chamber. Configuration lives at ~/.config/star-chamber/providers.json with configurable timeouts (default 60 seconds) and consensus thresholds.

The Multi-LLM Trend

This isn't an isolated idea. Perplexity launched its "Model Council" on February 5, 2026, using the same multi-LLM consensus concept, developed independently. The convergence suggests a real pattern: single-model outputs have known blind spots, and cross-referencing multiple models is a practical way to filter signal from noise.

The practical question is cost. Running three or five LLM providers on every code review multiplies your API spend by 3-5x. For critical architectural decisions or security-sensitive code, that math works. For routine PRs, it probably doesn't. Star Chamber's provider selection flags (-p to pick specific models per review) let you dial this up or down, which is the right design choice.

The roughly 3,000-line codebase is published to PyPI and available now.