IBM's Granite 4.0 Speech Model Fits 6 Languages in 1 Billion Parameters

Granite 4.0 1B Speech: Compact, Multilingual, and Built for the Edge
Image: Hugging Face

One billion parameters. That's all IBM's new Granite 4.0 1B Speech model needs to transcribe and translate speech across six languages, and it just claimed the top spot on Hugging Face's OpenASR leaderboard.

The model is half the size of its predecessor (Granite Speech 3.3 2B) but handles automatic speech recognition and bidirectional speech translation in English, French, German, Spanish, Portuguese, and Japanese - the last being a new addition. It ships under the Apache 2.0 license, meaning anyone can use it commercially without restrictions, and it runs natively in both Hugging Face Transformers and vLLM.

What makes this interesting for practitioners is the edge deployment angle. At 1B parameters, this model can realistically run on phones, embedded devices, and local hardware without needing a GPU cluster. IBM is positioning it for enterprise use cases where sending audio to a cloud API isn't an option - think healthcare transcription, factory floor communication, or field service in areas with spotty connectivity.

The Benchmark Picture

Granite 4.0 1B Speech achieves competitive word error rates (the standard measure of transcription accuracy, where lower is better) against models with significantly more parameters. Claiming #1 on the OpenASR leaderboard is notable, though benchmark rankings in speech recognition shift frequently as new models appear.

The model also introduces keyword list biasing, a feature that lets you feed it a list of specific names, acronyms, or domain terms so it recognizes them correctly. Anyone who has watched a transcription tool butcher company names or technical jargon knows why this matters.

IBM recommends pairing it with their Granite Guardian 3.3 8B model for deployments that need content safety filtering - a sensible addition for enterprise use, though it does add to the total compute footprint.

For teams already running speech pipelines with Whisper or other open models, this is worth benchmarking against your specific use case. The compact size and Apache 2.0 license remove the usual friction of evaluation.