Tools Notable

LM Studio Adds MTP Speculative Decoding for Faster Local Inference

May 20, 2026 1 min read

LM Studio, the desktop app for running AI models locally on your own hardware, just shipped support for MTP speculative decoding - a technique that can push significantly more tokens per second out of the same hardware.

To understand why this matters, here's the mechanic: standard AI text generation produces one token (a word or word fragment) per forward pass through the model. Speculative decoding breaks that bottleneck by drafting several tokens ahead and verifying them all in a single pass. The "MTP" variant - multi-token prediction - is baked directly into the model's architecture rather than relying on a separate smaller draft model running alongside. Models like DeepSeek-V3 and DeepSeek-R1 were trained with these built-in prediction heads specifically to enable this. The result is faster output without touching your VRAM budget.

Users testing the update on DeepSeek models are reporting speed gains in the range of 20-40% tokens per second depending on hardware. On a long generation - a 2,000-word draft, say, or a complex code file - that compounds into seconds saved per response.

The feature has been available in lower-level tools like llama.cpp for a while, so LM Studio users have been waiting for the gap to close. It's now closed. If you're running a DeepSeek model, updating is worth doing immediately. For models without native MTP heads, this specific update won't help - standard speculative decoding with a separate draft model remains a different feature and isn't what shipped here.

More from today

IrisGo Wants to Be an AI Agent That Learns Your Desktop Workflows Automatically

Google Brings Prompt-Based Widget Building to Android at I/O 2026

Google Beam Group Meeting Experiment Reports 50% Boost in Social Connection

Cookie Preferences