LM Studio, the desktop app for running AI models locally on your own hardware, just shipped support for MTP speculative decoding - a technique that can push significantly more tokens per second out of the same hardware.
To understand why this matters, here's the mechanic: standard AI text generation produces one token (a word or word fragment) per forward pass through the model. Speculative decoding breaks that bottleneck by drafting several tokens ahead and verifying them all in a single pass. The "MTP" variant - multi-token prediction - is baked directly into the model's architecture rather than relying on a separate smaller draft model running alongside. Models like DeepSeek-V3 and DeepSeek-R1 were trained with these built-in prediction heads specifically to enable this. The result is faster output without touching your VRAM budget.
Users testing the update on DeepSeek models are reporting speed gains in the range of 20-40% tokens per second depending on hardware. On a long generation - a 2,000-word draft, say, or a complex code file - that compounds into seconds saved per response.
The feature has been available in lower-level tools like llama.cpp for a while, so LM Studio users have been waiting for the gap to close. It's now closed. If you're running a DeepSeek model, updating is worth doing immediately. For models without native MTP heads, this specific update won't help - standard speculative decoding with a separate draft model remains a different feature and isn't what shipped here.