SAM 3.1 introduces Object Multiplex, a shared-memory approach for joint multi-object tracking that is significantly faster without sacrificing accuracy. This release also includes new model checkpoints and optimized inference.
SAM 3's video pipeline processes each tracked object independently, which scales linearly with the number of objects. Object Multiplex groups objects into fixed-capacity buckets and processes them jointly, drastically reducing redundant computation. For technical details, see Appendix H (Object Multiplex) in the SAM 3 paper.
- ~7x speedup at 128 objects on a single H100 GPU compared to the SAM 3 November 2025 release
- Inference optimizations that significantly improve multi-object tracking efficiency:
- Reduced CPU-GPU synchronization in detection-tracker association and other heuristics
- Enhanced
torch.compilesupport with improved operation fusion - Batched postprocessing and vision encoder to increase GPU utilization
- Mixed results on SA-Co/VEval video benchmarks, with notable improvement on YT-Temporal-1B (+2.1 cgF1)
- Improved VOS performance on 6 out of 7 benchmarks, including +2.0 on the challenging MOSEv2
| Model | SA-Co/VEval benchmark test split | Public benchmarks | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SA-V | YT-Temporal-1B | SmartGlasses | LVVIS | BURST | YTVIS21 | OVIS | ||||
| cgF1 | pHOTA | cgF1 | pHOTA | cgF1 | pHOTA | test mAP | test HOTA | val mAP | val mAP | |
| SAM 3 | 30.3 | 58.0 | 50.8 | 69.9 | 36.4 | 63.6 | 36.3 | 44.5 | 57.4 | 60.5 |
| SAM 3.1 | 30.5 | 58.7 | 52.9 | 70.7 | 36.3 | 64.4 | 34.3 | 43.3 | 56.6 | 61.5 |
| Model | J&F | G | J&Ḟ | ||||
|---|---|---|---|---|---|---|---|
| MOSEv1 val | DAVIS17 val | LVOSv2 val | SA-V val | SA-V test | YTVOS19 val | MOSEv2 val | |
| SAM 3 | 78.4 | 92.2 | 88.5 | 83.5 | 84.4 | 89.7 | 60.3 |
| SAM 3.1 | 79.6 | 92.7 | 89.2 | 83.8 | 85.1 | 89.3 | 62.3 |
The SAM 3.1 checkpoints are available on the Hugging Face repo. See Getting Started for download and authentication instructions.
sam3.1_video_predictor_example.ipynb: Demonstrates how to use SAM 3.1 with Object Multiplex for video segmentation and dense tracking with text and point prompts.
Arpit Kalla, Chaitanya Ryali, Christian Puhrsch, Ho Kei Cheng, Joseph Greer, Meng Wang, Miran Heo, Pengchuan Zhang, Roman Rädle, Yuan-Ting Hu

