Two open-weight models landed on Hugging Face from NVIDIA's research lab: Cosmos 3 Nano at 16 billion parameters and Cosmos 3 Super at 64 billion. Both accept combinations of text, images, video, and physical action trajectories as input, and can generate video, images, audio, and action commands as output - all from a single model.
Most generative AI models specialize in one format. Image generators handle images. Video models handle video. Cosmos 3 spans all of them from one model - NVIDIA's term for this is "omnimodal." But the more distinctive capability is the action command output. This isn't primarily a media generation tool; it's designed as a simulation engine for Physical AI, meaning training robots and autonomous systems by generating realistic synthetic environments alongside the movement instructions that go with them.
What the Action Command Output Actually Does
A robot or autonomous vehicle can't train on raw video alone - it needs to understand what physical actions correspond to what it sees. Cosmos 3 can generate that link directly: given a video input, it can output action trajectory data that tells a robotic arm or vehicle controller what movements to execute. That makes it useful for training physical systems in simulated environments before real-world deployment, which is significantly cheaper than real-world data collection.
For non-robotics applications, the video and image generation capabilities are still genuinely useful. Game developers, VFX teams, and content production pipelines can use the model for scene generation and variation. Multimodal input - describe a scene in text, provide a reference image, get video output - is a workflow that closed commercial APIs charge meaningful fees for.
Running It Locally
The 16B Nano model is feasible on high-end consumer hardware. An NVIDIA 4090 with quantization (a compression technique that trades some precision to fit the model into less GPU memory) can handle it. The 64B Super requires multiple GPUs or cloud inference. Both models are available on Hugging Face - review NVIDIA's license before commercial deployment, as their open-weight licenses typically carry commercial restrictions.
This is the first Cosmos generation with publicly downloadable weights at this capability level. Earlier Cosmos models were research-only releases. Shipping open weights signals NVIDIA is positioning Cosmos as a platform for external developers and researchers to build on, not just an internal benchmark for their hardware roadmap.