Meta just released SAM 3.1, the latest version of its Segment Anything Model for real-time video object detection and tracking. The headline improvement: it can now track up to 16 objects simultaneously in a single forward pass, doubling throughput from 16 to 32 frames per second on an H100 GPU.
The previous version, SAM 3, processed each tracked object separately. That meant tracking five objects required five times the compute. SAM 3.1 introduces "object multiplexing" - a technique where all tracked objects are processed together in one pass through the model. The result is faster performance that actually improves as scenes get more complex, because the model reasons about all objects globally rather than in isolation.
Performance in Practice
The speed gains matter most in crowded scenes with visually similar objects, which is exactly where previous versions struggled. SAM 3.1's global reasoning approach means it's better at keeping track of, say, individual players on a sports field or multiple products on a conveyor belt, without confusing one for another.
Meta designed SAM 3.1 as a drop-in replacement for SAM 3, so existing integrations should work without code changes. The model checkpoint is available on Hugging Face along with the updated codebase.
Who This Is For
SAM (Segment Anything Model) is a computer vision model that identifies and outlines objects in images and video. "Segmentation" means drawing precise boundaries around objects - not just a bounding box, but the exact shape. This is foundational tech for video editing, AR effects, robotics, and automated quality inspection.
Meta is already using SAM internally in Instagram's Edits app and the Meta AI Vibes app for creative tools. For developers and researchers building video analysis tools, the open-source release means you can run this locally without API costs. The fact that it runs at 32 FPS on a single GPU puts real-time video segmentation within reach for smaller teams that don't have massive compute budgets.