Open Source

Sumi: Open-Source Voice-to-Text Tool Built for AI Coding Workflows

March 9, 2026 2 min read

Running three or four AI coding agents at the same time creates a surprising bottleneck: typing fast enough to keep up with all of them. That problem led a developer in Taiwan to build Sumi, an open-source voice-to-text tool that handles both speech recognition and text cleanup locally on your machine.

Sumi uses a two-stage pipeline. First, it transcribes your speech using either Whisper (the open-source model from OpenAI, with seven model size options) or Qwen3-ASR, a newer speech recognition model from Alibaba. The developer wrote the entire inference pipeline (the process of running the AI model to get results) in pure Rust for speed, and quantized the Qwen3-ASR model (compressed it to use less memory while keeping accuracy). According to the developer, Qwen3-ASR handles accented speech and regional dialects better than Whisper does.

The second stage is where it gets interesting for daily users. After transcription, a local language model cleans up the raw text - fixing grammar, removing filler words, and polishing the output into something you can paste directly into a chat or code editor. Both stages run on your hardware, so nothing leaves your machine.

Who This Is Actually For

The obvious audience is developers who talk to AI coding assistants like Claude Code or Cursor. But the local-first approach matters for anyone handling sensitive information. Lawyers dictating case notes, medical professionals, or anyone working with proprietary data gets voice input without sending audio to a cloud API.

The tradeoff is setup complexity. You need a machine with enough power to run these models locally, and Rust-based tooling is still less polished than commercial alternatives like Wispr Flow or Dragon NaturallySpeaking. But for power users who want full control over their voice input pipeline and already have a capable GPU, Sumi fills a gap that commercial tools have not addressed: voice-to-text designed specifically for the workflow of talking to AI agents all day.

Sumi is available on GitHub under an open-source license.

Who This Is Actually For

Related Tools

More from today

CodeGraph Cuts Claude Code Token Usage by 30% With Local Code Indexing

Andrew Ng's Context Hub Gives AI Coding Agents Persistent Memory for APIs

IBM's Granite 4.0 Speech Model Fits 6 Languages in 1 Billion Parameters

Cookie Preferences