Tenstorrent's $9,999 QuietBox 2: 480 RISC-V Cores for Local AI Inference

AI news: Tenstorrent's $9,999 QuietBox 2: 480 RISC-V Cores for Local AI Inference

476 tokens per second on Llama 3.1 70B. That is the headline number from Tenstorrent's QuietBox 2, a liquid-cooled desktop workstation that runs entirely on RISC-V processors and plugs into a standard 120-volt wall outlet. No rack. No server room. No specialized power.

The machine ships in Q2 2026 starting at $9,999.

What Is Inside

Four Blackhole ASICs (application-specific integrated circuits - chips designed for one job) operate as a unified mesh, delivering 480 Tensix cores and 2,654 TFLOPS of compute at BlockFP8 precision (a number format optimized for AI workloads that trades a small amount of accuracy for major speed gains). Each Blackhole card carries 32GB of GDDR6 memory, totaling 128GB for the system, plus 256GB of DDR5 system memory.

The key architectural decision: Tenstorrent uses on-chip SRAM (fast memory built directly into the processor) instead of HBM (High-Bandwidth Memory), the expensive stacked memory chips that NVIDIA GPUs rely on and that have been in chronic short supply. This sidesteps the HBM bottleneck entirely, which matters both for pricing and for actually being able to ship units on schedule.

Idle power consumption dropped 50% from the previous generation, and the liquid cooling keeps it quiet enough for a desk. "QuietBox" is literal, not marketing.

Benchmark Numbers

Beyond the Llama 70B result, Tenstorrent claims the QuietBox 2 can run models up to 120 billion parameters entirely on-device. The company demonstrated local image generation with Flux, video synthesis with Wan 2.2, and a protein structure prediction (using Boltz-2) that folded a 686-amino-acid protein in 49 seconds. For context, that same task takes roughly 45 minutes on a modern CPU - about 55 times slower.

Fully Open-Source Software Stack

This is where it gets interesting for developers who are tired of working inside black boxes. The entire software stack is open-source: TT-Forge (AI compiler), TT-Metalium (low-level SDK), TT-LLK (kernel software), and TT-Studio (development environment). It runs Ubuntu 24.04 and supports PyTorch, ONNX, TensorFlow, JAX, and PaddlePaddle.

Tenstorrent's CEO Jim Keller - the chip architect behind AMD's Zen, Apple's A4/A5, and Tesla's self-driving silicon - put it plainly: "Every layer of QuietBox 2's software is open source. This is not just an open API on a black box; it is full-stack visibility."

Who This Is For

At $10K, this is not an impulse buy. But for teams running local inference (generating AI outputs on your own hardware instead of paying per-token to a cloud provider) on sensitive data, or researchers who need to see and modify every layer of the stack, the economics could work. Running Llama 70B at 476 tokens/sec locally eliminates API costs entirely, and 128GB of on-device memory means you are not constantly swapping models in and out.

Tenstorrent is also collaborating with Razer on a separate consumer AI accelerator using their Wormhole chip, which suggests they are serious about building a product line, not just a one-off showpiece. Preorders are open at tenstorrent.com.