1.15 gigabytes. That's smaller than most smartphone apps - and it's the full footprint of Bonsai 8B, a new open-source language model built around 1-bit quantization.
Standard language models store their parameters (the numerical values that encode what the model has learned) as 16-bit or 32-bit floating point numbers. Quantization compresses those numbers down - 8-bit, 4-bit, 2-bit - trading some precision for a smaller file. 1-bit quantization takes that to the limit: each parameter stores either -1 or +1, nothing in between. An 8 billion parameter model normally runs around 16GB at full precision, or 4-5GB at the 4-bit compression most local model users run today. Bonsai 8B gets to 1.15GB.
Built by Firethering, the model is designed for hardware that can't run conventional local models - thin laptops, Raspberry Pis, edge devices (hardware deployed outside data centers, like sensors or embedded systems). The approach follows Microsoft's BitNet b1.58 research from 2024, which established that 1-bit models could work at meaningful scale.
The tradeoff is quality. 1-bit models consistently underperform compared to well-quantized 4-bit models on standard tasks. Bonsai 8B is more compelling as a proof of how far compression can go than as a daily driver for anything requiring accurate output.
For developers building on-device AI or exploring edge deployment, it's worth running against your specific use case. The gap between 1-bit and 4-bit quality varies significantly by task - some workloads tolerate extreme compression well, others don't.