
Introducing SAM Audio, Meta’s latest AI breakthrough, is positioned as the first unified multimodal model capable of separating audio sources across music, speech, and ambient sounds. The system allows users to isolate a specific sound by issuing text prompts—such as “remove the drums”—or by providing visual cues, like a waveform snippet, effectively turning audio editing into a conversational task. The model’s architecture blends large‑scale audio training data with multimodal prompt handling, including “span prompts” that let users specify precise temporal boundaries for extraction. Meta highlights the ability to layer multiple prompts, enabling complex workflows such as extracting a vocal line while simultaneously suppressing background chatter. Early benchmarks suggest SAM Audio matches or exceeds specialist separation tools while offering a single, unified interface. In the demo, Meta engineers showcase a musician pulling the bass track from a full mix using a simple text command, an audio engineer cleaning up a conference recording with a visual cue, and a video creator isolating crowd noise for a cinematic effect. One quoted line—“Use a span prompt to get even more precision”—underscores the emphasis on fine‑grained control, while the mention of “tinkerers of all skill levels” signals an intent to democratize the technology. If the model lives up to its promises, it could reshape the audio‑production landscape by lowering the barrier to high‑quality separation, reducing reliance on costly hardware and specialized software. Content platforms, advertisers, and streaming services may integrate SAM Audio to automate remixing, captioning, and sound‑design tasks, accelerating time‑to‑market for audio‑rich media.

The video introduces SAM 3, Meta’s latest unified model that combines object detection and tracking within a single architecture. Built on the foundation of the SAM 2 segmentation model, SAM 3 employs two dedicated transformer modules—one for detecting object instances in individual frames...

Researchers at Carnegie Mellon are integrating advanced AI models such as Meta’s SAM 3D body with biomechanical motion-capture data to create personalized rehabilitation programs. By combining highly accurate lab-based motion capture with billions of everyday images of natural movement, the...

Meta and Conservation X Labs are deploying advanced AI — including SAM 3 and CM3 — to automate identification and behavioral monitoring of wildlife in camera-trap videos, enabling precise individual-level tracking rather than simple bounding boxes. The partners will release...

Meta's SAM 3D uses a two-model approach—one specialized for 3D human body reconstruction and a second generic model for 3D object reconstruction—to bring recognition and prior knowledge into areas where geometry-based methods fall short. The team borrowed preference optimization techniques...

Meta’s SAM 3 introduces text prompting to its segmentation model, allowing users to input short phrases and have the model automatically find and segment objects. To scale annotated training data, Meta used fine-tuned LLaMA-based AI annotators that learned from human...

Meta unveiled Segment Anything Model 3 (SAM 3), a unified model that combines detection, segmentation and tracking for images and video. Building on click prompting from previous versions, SAM 3 introduces text prompting and visual prompting to detect and segment...