World's Largest Open-Source Multimodal Dataset Delivers 17x Training Efficiency, Unlocking Enterprise AI that Connects Documents, Audio and Video

World's Largest Open-Source Multimodal Dataset Delivers 17x Training Efficiency, Unlocking Enterprise AI that Connects Documents, Audio and Video

VentureBeat
VentureBeatOct 17, 2025

Why It Matters

The combination of scale, cleaner evaluation and parameter efficiency makes multimodal AI practical for enterprise search, compliance, healthcare, robotics and edge deployment, lowering compute costs and unlocking cross-silo insights from documents, audio and video.

Summary

Encord today released EMM-1, the largest open-source multimodal dataset with 1 billion paired examples and 100 million data groups across five modalities (text, image, video, audio and 3D point clouds), paired with an EBind training methodology that emphasizes data quality to deliver up to 17x parameter efficiency. A compact 1.8 billion-parameter model trained on EMM-1 matches performance of models up to 17 times larger and cuts training time from days to hours on a single GPU by eliminating data leakage and using a single-base, multi-encoder architecture. The combination of scale, cleaner evaluation and parameter efficiency makes multimodal AI practical for enterprise search, compliance, healthcare, robotics and edge deployment, lowering compute costs and unlocking cross-silo insights from documents, audio and video.

World's largest open-source multimodal dataset delivers 17x training efficiency, unlocking enterprise AI that connects documents, audio and video

Comments

Want to join the conversation?

Loading comments...