One checkpoint file. Three model sizes. No retraining to switch between them.
That's the design of Star Elastic, a model release from NVIDIA AI that packages 30B, 23B, and 12B parameter reasoning models into a single saved file. The trick it's built around: you can "slice" out a smaller version without any additional training or fine-tuning (adjusting the model on new data).
A few terms worth unpacking. A "checkpoint" is simply a saved AI model file you download and load. "30B/23B/12B" refers to billions of parameters - the rough measure of a model's size and capability. More parameters generally means better output but also more GPU memory required to run it. "Reasoning models" are specifically trained to think through problems in steps rather than fire back a quick answer. And "zero-shot slicing" means the smaller models are extracted from the larger one at inference time - no separate training run.
The clearest analogy: adaptive video streaming. A single encoded file serves you 4K, 1080p, or 720p depending on your connection - it's not three separate uploads. Star Elastic does the same thing with model weights.
What This Solves for Local Runners
Running models locally is constrained almost entirely by GPU memory (VRAM). A 30B model at standard precision needs roughly 20GB of VRAM - out of reach for most consumer cards. The 12B slice needs closer to 8GB, which fits on a mid-range gaming GPU.
Today, managing multiple model sizes means downloading and tracking multiple separate files. A 30B checkpoint alone can run 18-20GB. Star Elastic collapses that into one download and lets you pick your performance tier based on what your hardware can handle. Load the 23B slice for a deep reasoning task, drop to 12B when you need faster turnaround, all from the same file.
For developers building local AI workflows - agentic pipelines, code assistants, anything where model-switching matters - this cuts the storage and file-management overhead significantly.
The Open Question on Quality
Models designed to be small are usually trained from scratch to be efficient at that size. A sliced-down version extracted from a larger model may not match a purpose-built 12B on all tasks, particularly if the slicing removes capacity from specialized reasoning pathways rather than distributing it evenly.
The comparison that would settle this - sliced Star Elastic 12B versus a standalone 12B reasoning model across standard benchmarks - isn't prominently available yet. That gap matters before committing to the format for production use. For experimentation and local testing, Star Elastic is genuinely worth pulling down. Whether the convenience of one file comes with any quality cost at the smaller sizes is the answer to get before relying on it.