Open Source

Qwen3 30B Now Runs on a Raspberry Pi 5 at 7-8 Tokens Per Second

March 20, 2026 2 min read

Image: Alibaba Cloud

7-8 tokens per second. That's the speed a developer hit running Qwen3's 30 billion parameter model on a Raspberry Pi 5 with just 8GB of RAM - a single-board computer that costs about $80.

The trick is Qwen3-30B-A3B's architecture. It's a mixture-of-experts model (MoE), meaning it has 30 billion total parameters but only activates about 3 billion of them for any given token. That keeps the actual memory and compute requirements far below what a traditional 30B model would need. The "A3B" in the name stands for "active 3 billion" - the portion of the model doing work at any moment.

For context, 7-8 tokens per second translates to roughly 25-30 words per second of generated text. That's fast enough to feel conversational, though not instant. Running it on a board the size of a credit card with no GPU, no cloud connection, and no subscription fee is the real story here.

This matters for a specific and growing group of users: people who want AI that runs entirely on their own hardware. Privacy-sensitive use cases, offline deployments, embedded systems, and hobbyist projects all benefit from models that can run on cheap, low-power hardware. A Raspberry Pi pulling maybe 10 watts while running a capable language model is a very different proposition from renting GPU time or paying per-token API fees.

Qwen3-30B-A3B isn't going to replace Claude or GPT-4 for complex reasoning tasks. But for local assistants, document Q&A, code completion on a home server, or just experimenting with AI without a cloud bill, the floor for what hardware you need keeps dropping. Two years ago you needed a workstation GPU to run models this size. Now it runs on a board you can power with a phone charger.

More from today

Qwen3.5 Punches Above Its Weight, But Only If You Feed It Context

Etnamute Uses Claude Code to Build and Ship Mobile Apps Autonomously

LiteParse: LlamaIndex's New Open-Source Document Parser Runs Locally Without GPUs

Cookie Preferences