Nine seconds. That's how long it takes to transcribe five minutes of audio using OpenAI's Whisper model running locally on an M2 MacBook Air, if you use the right setup. A detailed benchmark from the Yaps team puts hard numbers on the three ways to run Whisper on Apple Silicon, and the performance gaps are dramatic.
Whisper is OpenAI's open-source speech-to-text model (meaning it converts spoken audio into written text). It supports 99 languages and comes in five sizes, from the tiny 39-million-parameter model that needs about 1 GB of RAM to the full large-v3 with 1.55 billion parameters requiring roughly 10 GB.
Python vs. C++ vs. Core ML
The benchmarks used a 5-minute audio file on an M2 MacBook Air with 16 GB of RAM, testing the "small" and "medium" model sizes across three methods:
- Python (openai-whisper package): 48 seconds for the small model, 2 minutes 15 seconds for medium. Roughly 6x slower than the C++ alternative.
- whisper.cpp: Georgi Gerganov's C/C++ port with Metal GPU acceleration. 14 seconds for small, 38 seconds for medium.
- whisper.cpp with Core ML: The fastest option. 9 seconds for the small model (33x faster than real-time audio), 24 seconds for medium.
The Python route is easier to set up but requires Homebrew, Python 3.11, and ffmpeg. whisper.cpp needs to be compiled from source but rewards you with 3-4x better performance. Adding Core ML (Apple's on-device machine learning framework) shaves off another 35-40%.
The Trade-Offs
Local Whisper only works on audio files, not live microphone input, without writing additional code. There's no system-wide integration, so you're working in the terminal. Model updates are manual. And Python dependency management on Mac remains its own special kind of frustrating.
For batch transcription of recordings where privacy matters (legal, medical, journalistic sources), local Whisper is a solid option. The large-v3 model produces near-professional accuracy. For real-time dictation in everyday apps, you'll still want a dedicated tool with system-level integration.