Related ToolsClaude CodeCursorChatgptContinueAider

llama.cpp Merges Full MCP Support with Agentic Loop and Tool Calls

Meta Llama
Image: Meta

llama.cpp Merges Full MCP Support with Agentic Loop and Tool Calls

What Happened

A massive pull request adding Model Context Protocol (MCP) client support to llama.cpp was merged on March 6, 2026. PR #18655, authored by contributor allozaur, added 15,285 lines of code across 147 changed files with 374 commits. This is one of the largest single feature additions in the project's history.

The merge brings a full MCP client implementation to llama-server's WebUI, including:

  • MCP server management with a server selector and settings cards showing capabilities and instructions
  • Tool calls with an agentic loop that lets local models call external tools, process results, and iterate automatically
  • MCP Prompts with a picker UI, argument forms, and prompt attachments
  • MCP Resources with a file browser, search, tree view, and resource previews
  • CORS proxy on the llama-server backend (enabled with --webui-mcp-proxy)
  • Processing stats UI for tracking agentic loop performance

The PR also included significant UI improvements: a collapsible content block for tool calls, improved code block rendering, a searchable dropdown component, and better markdown handling.

Why It Matters

MCP has become the standard protocol for connecting AI models to external tools. Anthropic introduced it with Claude, and it has since been adopted by Cursor, Continue, Claude Code, and dozens of other tools. But until now, if you were running models locally through llama.cpp, you were locked out of this ecosystem.

This changes that. Anyone running Llama, Mistral, Qwen, or any other GGUF model locally can now connect to the same MCP servers that work with commercial products. That means local models can browse files, query databases, call APIs, and execute multi-step agentic workflows - all through a standardized protocol.

For the local LLM community, this is a practical upgrade. You no longer need to choose between running models locally and having tool integration. The agentic loop means models can call a tool, read the result, decide what to do next, and repeat - the same pattern that makes Claude Code and Cursor's agent mode useful.

Our Take

This is the kind of infrastructure work that quietly shifts what is possible. MCP adoption has been fast in the commercial AI tool space, but open-source local inference has lagged behind. With llama.cpp being the most widely used local inference engine, this merge effectively brings the entire MCP tool ecosystem to self-hosted setups.

The practical impact depends on how well smaller local models handle tool calling. A 70B parameter model running on consumer hardware will not match Claude or GPT-4o at complex agentic tasks. But for simpler workflows - file operations, database queries, API calls with structured responses - local models with MCP could be genuinely useful, especially for developers who want tool integration without sending data to external APIs.

The 374-commit, 147-file scope of this PR is worth noting. This was not a minimal integration. The contributor built a full-featured MCP client with resource browsing, prompt management, and statistics tracking. That level of completeness suggests this will actually get used, not just demonstrated.

If you are already running llama.cpp locally, update and try --webui-mcp-proxy. If you have been holding off on local models because of the tooling gap, this is the update worth paying attention to.