Open Source

Developer Runs 4 Claude Code Agents Offline With Local Open-Source Models

March 22, 2026 2 min read

Image: Anthropic

A developer has demonstrated a fully offline, multi-agent coding setup running four AI agents in parallel - all on local hardware with no API calls leaving the machine.

The setup combines three pieces: vLLM (an open-source inference server that runs large language models on local GPUs), an open-source 120-billion parameter model called gpt-oss-120b, and Claude Code's agent teams feature, which allows multiple AI agents to work on a codebase simultaneously and coordinate with each other.

The result, shown in a demo video, is four agents collaborating on coding tasks concurrently - what the developer community calls "vibecoding" - entirely offline.

How the Pieces Fit Together

Claude Code normally calls Anthropic's API for its AI capabilities. But its architecture supports swapping in alternative model providers, including local ones. By pointing Claude Code at a vLLM server running on the same machine, the developer replaced cloud API calls with local inference.

vLLM handles the heavy lifting of running the 120B parameter model efficiently. A model this size typically requires multiple high-end GPUs with substantial VRAM - likely 2-4 GPUs with 24GB+ each, depending on quantization (a technique that reduces model size by using lower-precision numbers, trading some accuracy for speed and memory savings).

Claude Code's agent teams feature - distinct from its simpler subagents - lets multiple agents run in parallel with their own context windows, coordinating on shared tasks. Each agent can read files, write code, and run commands independently.

What This Means for Local AI Development

Running AI coding agents locally has a few clear advantages: no API costs, no data leaving your network, and no rate limits. For companies with strict data policies or developers working on sensitive codebases, offline operation removes the biggest barrier to using AI coding tools.

The trade-off is hardware cost and model quality. A 120B open-source model is capable, but current cloud-hosted models from Anthropic and OpenAI still outperform open-source alternatives on complex coding tasks. The gap is closing, though, and for straightforward coding work the difference may not matter.

The setup also required switching to Linux, which is still the path of least resistance for local AI inference. Windows and macOS support for tools like vLLM exists but remains rougher around the edges.

This is a proof of concept, not a polished product. But it shows the infrastructure for fully local AI-assisted development is functional today for developers willing to invest in the hardware and configuration.

How the Pieces Fit Together

What This Means for Local AI Development

Related Tools

More from today

Alibaba Reaffirms Open-Source Commitment for Qwen and Wan Model Lines

MiniMax M2.7, a Frontier-Class Reasoning Model, Is Going Open Weights

Sashiko: AI Code Reviewer Catches 53% of Linux Kernel Bugs Humans Missed

Cookie Preferences