SAM 3: Under the Hood of the Data Engine | AI at Meta

•November 20, 2025

0

AI at Meta

AI at Meta•Nov 20, 2025

Why It Matters

By automating and massively scaling high-quality image annotation, Meta cut the data bottleneck for object recognition—yielding substantial performance gains and enabling tighter integration of vision and language capabilities that could accelerate multimodal AI applications and product features.

Summary

Meta’s SAM 3 introduces text prompting to its segmentation model, allowing users to input short phrases and have the model automatically find and segment objects. To scale annotated training data, Meta used fine-tuned LLaMA-based AI annotators that learned from human examples to produce both positive and negative labels, enabling faster, more accurate mask creation at much larger scale. The dataset emphasized short, diverse phrases and doubled the model’s performance versus competitors, and SAM 3 is positioned to integrate with large language models for more complex multimodal tasks. Meta frames SAM 3 as a vision agent that enhances language models’ visual perception.

Original Description

Collecting a high quality dataset with 4M unique phrases and 52M corresponding object masks helped SAM 3 achieve 2x the performance of baseline models. Kate, a researcher on SAM 3, explains how the data engine made this leap possible.

🔗 Read the SAM 3 research paper: https://go.meta.me/6411f7

--

Subscribe: https://www.youtube.com/aiatmeta?sub_confirmation=1

Learn more about our work: https://ai.meta.com

Follow us on Twitter: https://twitter.com/aiatmeta

Follow us on Facebook: https://www.facebook.com/aiatmeta

Connect with us on LinkedIn: https://www.linkedin.com/showcase/aiatmeta/

Meta focuses on bringing the world together by advancing AI, powering meaningful and safe experiences, and conducting open research.

0

Comments

Want to join the conversation?

Loading comments...