A developer has demonstrated a fully offline, multi-agent coding setup running four AI agents in parallel - all on local hardware with no API calls leaving the machine.
The setup combines three pieces: vLLM (an open-source inference server that runs large language models on local GPUs), an open-source 120-billion parameter model called gpt-oss-120b, and Claude Code's agent teams feature, which allows multiple AI agents to work on a codebase simultaneously and coordinate with each other.
The result, shown in a demo video, is four agents collaborating on coding tasks concurrently - what the developer community calls "vibecoding" - entirely offline.
How the Pieces Fit Together
Claude Code normally calls Anthropic's API for its AI capabilities. But its architecture supports swapping in alternative model providers, including local ones. By pointing Claude Code at a vLLM server running on the same machine, the developer replaced cloud API calls with local inference.
vLLM handles the heavy lifting of running the 120B parameter model efficiently. A model this size typically requires multiple high-end GPUs with substantial VRAM - likely 2-4 GPUs with 24GB+ each, depending on quantization (a technique that reduces model size by using lower-precision numbers, trading some accuracy for speed and memory savings).
Claude Code's agent teams feature - distinct from its simpler subagents - lets multiple agents run in parallel with their own context windows, coordinating on shared tasks. Each agent can read files, write code, and run commands independently.
What This Means for Local AI Development
Running AI coding agents locally has a few clear advantages: no API costs, no data leaving your network, and no rate limits. For companies with strict data policies or developers working on sensitive codebases, offline operation removes the biggest barrier to using AI coding tools.
The trade-off is hardware cost and model quality. A 120B open-source model is capable, but current cloud-hosted models from Anthropic and OpenAI still outperform open-source alternatives on complex coding tasks. The gap is closing, though, and for straightforward coding work the difference may not matter.
The setup also required switching to Linux, which is still the path of least resistance for local AI inference. Windows and macOS support for tools like vLLM exists but remains rougher around the edges.
This is a proof of concept, not a polished product. But it shows the infrastructure for fully local AI-assisted development is functional today for developers willing to invest in the hardware and configuration.