128GB of unified memory in a mini PC. AMD's Halo Box, built around the Ryzen AI Max+ 395 processor, is aimed squarely at users who want to run large AI language models on their own hardware without buying server equipment.
The unified memory spec is the key detail here. In a traditional desktop setup, the GPU has its own separate memory pool - usually 8GB to 24GB on consumer graphics cards. AI language models load entirely into this memory, which is why most people can only run small or heavily compressed models locally. Unified memory means the CPU and RAM share the same pool. At 128GB, you can load a 70B parameter model (like Meta's Llama 3.1 70B) at full quality, with room to spare. Push into quantized versions - compressed formats that trade a small amount of quality for much smaller file sizes - of larger models and you're looking at 405B-scale territory.
The Competition Is Apple
The obvious comparison is Apple's M-series chips, which have used the same unified memory architecture for years. An M4 Max MacBook Pro with 128GB currently runs $4,000-$5,000. AMD hasn't announced Halo Box pricing yet, but if it comes in meaningfully cheaper and real-world performance holds up, it gives Windows and Linux users a genuine local AI machine without paying Apple's premium.
A demo unit has been spotted running Ubuntu, which makes sense - most open-source model serving tools (like AnythingLLM) are built Linux-first. The Ryzen AI Max+ 395 also includes 50 TOPS (tera operations per second) of dedicated AI compute built directly into the chip.
The real question is inference speed on actual workloads - how fast the model generates text in practice, not on a spec sheet. Apple's neural engine has years of software optimization behind it. AMD's local AI compute story on consumer silicon is newer territory, and production benchmarks will matter far more than announcement photos.