7-8 tokens per second. That's the speed a developer hit running Qwen3's 30 billion parameter model on a Raspberry Pi 5 with just 8GB of RAM - a single-board computer that costs about $80.
The trick is Qwen3-30B-A3B's architecture. It's a mixture-of-experts model (MoE), meaning it has 30 billion total parameters but only activates about 3 billion of them for any given token. That keeps the actual memory and compute requirements far below what a traditional 30B model would need. The "A3B" in the name stands for "active 3 billion" - the portion of the model doing work at any moment.
For context, 7-8 tokens per second translates to roughly 25-30 words per second of generated text. That's fast enough to feel conversational, though not instant. Running it on a board the size of a credit card with no GPU, no cloud connection, and no subscription fee is the real story here.
This matters for a specific and growing group of users: people who want AI that runs entirely on their own hardware. Privacy-sensitive use cases, offline deployments, embedded systems, and hobbyist projects all benefit from models that can run on cheap, low-power hardware. A Raspberry Pi pulling maybe 10 watts while running a capable language model is a very different proposition from renting GPU time or paying per-token API fees.
Qwen3-30B-A3B isn't going to replace Claude or GPT-4 for complex reasoning tasks. But for local assistants, document Q&A, code completion on a home server, or just experimenting with AI without a cloud bill, the floor for what hardware you need keeps dropping. Two years ago you needed a workstation GPU to run models this size. Now it runs on a board you can power with a phone charger.