AI at Meta

Publication

1 followers

Technical talks and updates from Meta AI and FAIR research teams

Introducing SAM Audio: The First Unified Multimodal Model for Audio Separation | AI at Meta

Introducing SAM Audio, Meta’s latest AI breakthrough, is positioned as the first unified multimodal model capable of separating audio sources across music, speech, and ambient sounds. The system allows users to isolate a specific sound by issuing text prompts—such as “remove the drums”—or by providing visual cues, like a waveform snippet, effectively turning audio editing into a conversational task. The model’s architecture blends large‑scale audio training data with multimodal prompt handling, including “span prompts” that let users specify precise temporal boundaries for extraction. Meta highlights the ability to layer multiple prompts, enabling complex workflows such as extracting a vocal line while simultaneously suppressing background chatter. Early benchmarks suggest SAM Audio matches or exceeds specialist separation tools while offering a single, unified interface. In the demo, Meta engineers showcase a musician pulling the bass track from a full mix using a simple text command, an audio engineer cleaning up a conference recording with a visual cue, and a video creator isolating crowd noise for a cinematic effect. One quoted line—“Use a span prompt to get even more precision”—underscores the emphasis on fine‑grained control, while the mention of “tinkerers of all skill levels” signals an intent to democratize the technology. If the model lives up to its promises, it could reshape the audio‑production landscape by lowering the barrier to high‑quality separation, reducing reliance on costly hardware and specialized software. Content platforms, advertisers, and streaming services may integrate SAM Audio to automate remixing, captioning, and sound‑design tasks, accelerating time‑to‑market for audio‑rich media.

By AI at Meta

Video•Dec 8, 2025

SAM 3: Building a Unified Model Architecture for Detection and Tracking

The video introduces SAM 3, Meta’s latest unified model that combines object detection and tracking within a single architecture. Built on the foundation of the SAM 2 segmentation model, SAM 3 employs two dedicated transformer modules—one for detecting object instances in individual frames...

By AI at Meta

Video•Nov 26, 2025

Personalized Rehab with AI

Researchers at Carnegie Mellon are integrating advanced AI models such as Meta’s SAM 3D body with biomechanical motion-capture data to create personalized rehabilitation programs. By combining highly accurate lab-based motion capture with billions of everyday images of natural movement, the...

By AI at Meta

Video•Nov 24, 2025

How AI Is Helping Animal Conservation | AI at Meta

Meta and Conservation X Labs are deploying advanced AI — including SAM 3 and CM3 — to automate identification and behavioral monitoring of wildlife in camera-trap videos, enabling precise individual-level tracking rather than simple bounding boxes. The partners will release...

By AI at Meta

Video•Nov 20, 2025

SAM 3D: Behind the Two-Model Design | AI at Meta

Meta's SAM 3D uses a two-model approach—one specialized for 3D human body reconstruction and a second generic model for 3D object reconstruction—to bring recognition and prior knowledge into areas where geometry-based methods fall short. The team borrowed preference optimization techniques...

By AI at Meta

Video•Nov 20, 2025

SAM 3: Under the Hood of the Data Engine | AI at Meta

Meta’s SAM 3 introduces text prompting to its segmentation model, allowing users to input short phrases and have the model automatically find and segment objects. To scale annotated training data, Meta used fine-tuned LLaMA-based AI annotators that learned from human...

By AI at Meta

Video•Nov 19, 2025

Introducing Meta Segment Anything Model 3 (SAM 3): Unified Detection, Segmentation & Tracking

Meta unveiled Segment Anything Model 3 (SAM 3), a unified model that combines detection, segmentation and tracking for images and video. Building on click prompting from previous versions, SAM 3 introduces text prompting and visual prompting to detect and segment...

By AI at Meta

Technology Pulse

AI at Meta

Recent Posts

Introducing SAM Audio: The First Unified Multimodal Model for Audio Separation | AI at Meta

SAM 3: Building a Unified Model Architecture for Detection and Tracking

Personalized Rehab with AI

How AI Is Helping Animal Conservation | AI at Meta

SAM 3D: Behind the Two-Model Design | AI at Meta

SAM 3: Under the Hood of the Data Engine | AI at Meta

Introducing Meta Segment Anything Model 3 (SAM 3): Unified Detection, Segmentation & Tracking

Technology Pulse

AI at Meta

Recent Posts

Introducing SAM Audio: The First Unified Multimodal Model for Audio Separation | AI at Meta

SAM 3: Building a Unified Model Architecture for Detection and Tracking

Personalized Rehab with AI

How AI Is Helping Animal Conservation | AI at Meta

SAM 3D: Behind the Two-Model Design | AI at Meta

SAM 3: Under the Hood of the Data Engine | AI at Meta

Introducing Meta Segment Anything Model 3 (SAM 3): Unified Detection, Segmentation & Tracking