Introducing SAM Audio: The First Unified Multimodal Model for Audio Separation | AI at Meta

•December 16, 2025

0

AI at Meta

AI at Meta•Dec 16, 2025

Why It Matters

SAM Audio gives creators an AI‑driven, low‑cost way to isolate and manipulate individual sounds, potentially disrupting traditional audio‑editing tools and speeding up content production across music, film, and digital media.

Summary

Introducing SAM Audio, Meta’s latest AI breakthrough, is positioned as the first unified multimodal model capable of separating audio sources across music, speech, and ambient sounds. The system allows users to isolate a specific sound by issuing text prompts—such as “remove the drums”—or by providing visual cues, like a waveform snippet, effectively turning audio editing into a conversational task.

The model’s architecture blends large‑scale audio training data with multimodal prompt handling, including “span prompts” that let users specify precise temporal boundaries for extraction. Meta highlights the ability to layer multiple prompts, enabling complex workflows such as extracting a vocal line while simultaneously suppressing background chatter. Early benchmarks suggest SAM Audio matches or exceeds specialist separation tools while offering a single, unified interface.

In the demo, Meta engineers showcase a musician pulling the bass track from a full mix using a simple text command, an audio engineer cleaning up a conference recording with a visual cue, and a video creator isolating crowd noise for a cinematic effect. One quoted line—“Use a span prompt to get even more precision”—underscores the emphasis on fine‑grained control, while the mention of “tinkerers of all skill levels” signals an intent to democratize the technology.

If the model lives up to its promises, it could reshape the audio‑production landscape by lowering the barrier to high‑quality separation, reducing reliance on costly hardware and specialized software. Content platforms, advertisers, and streaming services may integrate SAM Audio to automate remixing, captioning, and sound‑design tasks, accelerating time‑to‑market for audio‑rich media.

Original Description

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts.

SAM Audio represents a new era in audio separation technology, outperforming previous models across a wide range of benchmarks and tasks.

We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach.

🔗 Learn more: https://go.meta.me/a5e8ef

--

Subscribe: https://www.youtube.com/aiatmeta?sub_confirmation=1

Learn more about our work: https://ai.meta.com

Follow us on Twitter: https://twitter.com/aiatmeta

Follow us on Facebook: https://www.facebook.com/aiatmeta

Connect with us on LinkedIn: https://www.linkedin.com/showcase/aiatmeta/

Meta focuses on bringing the world together by advancing AI, powering meaningful and safe experiences, and conducting open research.

0

Comments

Want to join the conversation?

Loading comments...