Tools

Apple M5 Max Benchmarks Show 36% GPU Gains, but LLM Numbers Are Still Missing

March 10, 2026 2 min read

The first independent benchmarks of Apple's M5 Max are out, and they tell half the story local-LLM enthusiasts actually care about.

Creative Strategies published a detailed teardown of the M5 Max's chiplet architecture and performance numbers. The CPU results are solid: Geekbench 6 single-core hit 4,246 (up from 3,895 on M4 Max), and multi-core reached 28,728 versus 25,984. That is roughly a 9-10% generational bump, consistent with what Apple has delivered in recent years.

The GPU is where things get more interesting. Cinebench GPU scores jumped to 93,577, a 36% improvement over the M4 Max's 68,590. That matters for anyone running local AI models, since many inference engines (the software that actually runs a model on your hardware) offload heavily to the GPU.

The Chiplet Design

The M5 Max splits its silicon into two tiles: one CPU-dominant, one GPU-dominant. This is Apple's first consumer chiplet design, using TSMC's SoIC-MH packaging to bond the tiles together. The practical benefit is thermal: the CPU and GPU no longer heat each other up when both are working hard. Sustained CPU power sits around 50W under load, with the whole system idling at just 7.1W (down from 7.6W on M4 Max).

The chip runs 18 CPU cores in a 6+12 configuration (six high-performance "super" cores, twelve standard performance cores) with no efficiency cores at all. That is a different approach from the base M5, which keeps efficiency cores for battery life.

Where Are the LLM Benchmarks?

Here is the frustrating part. The article acknowledges that AI benchmarking has two stages that matter: prefill (how fast the chip processes your prompt) and decode (how fast it generates tokens back to you). But it does not publish actual tokens-per-second numbers. The author notes that M4 Max comparisons are "coming soon," and the M3 Ultra head-to-head that many prospective buyers want is absent entirely.

For anyone considering the M5 Max specifically for running models like Llama 3 70B or Mixtral locally, the 36% GPU uplift and improved thermals are promising signals. But promising signals are not benchmarks. The M3 Ultra, with its 192GB unified memory ceiling, remains the chip to beat for large local models, and we still do not have a direct comparison.

The raw compute gains are real. The question that actually matters to the local-AI crowd - how many tokens per second on a 70B parameter model? - remains unanswered.

The Chiplet Design

Where Are the LLM Benchmarks?

Related Tools

More from today

The Circular Logic Problem: When AI Writes Both Your Code and Your Tests

Developer Claims 2x Productivity by Teaching Claude Code to Do Less

The 'Last Mile' Problem: Why Most AI-Built Apps Never Reach Production

Cookie Preferences