SAM 3: Building a Unified Model Architecture for Detection and Tracking

•December 8, 2025

0

AI at Meta

AI at Meta•Dec 8, 2025

Why It Matters

By unifying detection and tracking, SAM 3 streamlines video‑AI workflows, enabling faster product integration and opening new opportunities for real‑time visual intelligence in consumer and enterprise platforms.

Summary

The video introduces SAM 3, Meta’s latest unified model that combines object detection and tracking within a single architecture. Built on the foundation of the SAM 2 segmentation model, SAM 3 employs two dedicated transformer modules—one for detecting object instances in individual frames and another for maintaining consistent identities of those objects across video sequences.

Key technical insights focus on the divergent representation needs of detection versus tracking. Detection requires a shared representation for all instances of a class (e.g., multiple dogs should map to the same “dog” embedding), whereas tracking demands distinct embeddings for each instance to preserve identity over time. To reconcile this, Meta repurposes its detection transformer as the backbone for the detection head and integrates the SAM 2 tracker for temporal continuity, while leveraging the LAMA AI‑annotation engine to generate high‑quality training data.

The presenter highlights practical examples, noting that “one dog needs a different representation than another dog” to illustrate the tracking challenge. SAM 3 is positioned as a versatile tool that can operate standalone, augment multimodal large‑language models, or power consumer features such as Instagram’s “edits” app, where segmented objects receive dynamic visual effects.

If successful, SAM 3 could become a milestone in computer‑vision research by delivering a single, scalable model that handles both detection and tracking, reducing the need for separate pipelines and accelerating deployment in real‑time video applications across social media, autonomous systems, and enterprise analytics.

Original Description

Learn how SAM 3 tackles a challenging problem in vision: unifying a model architecture for detection and tracking. Christoph, a researcher on SAM 3, shares how the team made it possible.

--

Subscribe: https://www.youtube.com/aiatmeta?sub_confirmation=1

Learn more about our work: https://ai.meta.com

Follow us on Twitter: https://twitter.com/aiatmeta

Follow us on Facebook: https://www.facebook.com/aiatmeta

Connect with us on LinkedIn: https://www.linkedin.com/showcase/aiatmeta/

Meta focuses on bringing the world together by advancing AI, powering meaningful and safe experiences, and conducting open research.

0

Comments

Want to join the conversation?

Loading comments...