A Y Combinator W26 startup called RunAnywhere has open-sourced RCLI, a voice AI pipeline that runs entirely on Apple Silicon with no cloud dependency. The project claims to outperform every major local inference tool on Mac hardware, including llama.cpp, Apple's own MLX framework, and Ollama.
The core of the project is MetalRT, a custom inference engine built with hand-tuned Metal shaders (the low-level GPU programming interface on Apple devices). By skipping the overhead of general-purpose ML frameworks, the founders say they've squeezed out meaningfully faster performance across LLMs, speech-to-text, and text-to-speech. RCLI wraps this into a complete voice AI pipeline: microphone input to spoken response, all on-device, no API keys required.
Installation runs through Homebrew, and the project is MIT-licensed on GitHub.
How It Stacks Up
The local inference space on Mac has gotten crowded. Ollama made local LLMs dead simple to run. Apple's MLX gave researchers a PyTorch-like framework optimized for their own chips. And llama.cpp remains the workhorse that started the whole local LLM movement. RunAnywhere is betting that none of these are pushing Apple's Metal GPU capabilities hard enough.
The specific benchmark claims haven't been independently verified yet, and the project is early-stage. But the approach is sound: Metal has capabilities that generic ONNX or GGUF runtimes don't fully exploit, and there's real headroom in writing GPU kernels specifically for Apple's unified memory architecture.
For Mac users who want fully offline voice assistants or local AI pipelines without touching a cloud API, RCLI is worth testing. The no-API-keys, no-cloud angle is the real selling point. Performance claims aside, having a single brew install that gives you mic-to-speech AI on a MacBook is genuinely useful for privacy-conscious users and developers building offline-first applications.