LUMI AI Factory Launches Dataset-as-a-Service to Bring Data Closer to Compute

LUMI AI Factory Launches Dataset-as-a-Service to Bring Data Closer to Compute

HPCwire
HPCwireApr 1, 2026

Key Takeaways

  • Data resides where compute power lives, reducing latency
  • Unified catalog merges metadata, permissions, and location
  • Service built on existing FAIR components, lowering risk
  • Initial catalog offers >1 petabyte across 1,000 datasets
  • Automating DaaS functions will speed AI model training

Pulse Analysis

The traditional workflow of shuffling terabytes of training data between storage archives and high‑performance clusters creates latency and consumes network bandwidth. LUMI’s Dataset-as-a-Service flips this model by exposing datasets at the point of compute, effectively shrinking the data‑to‑insight pipeline. For AI developers, this means faster iteration cycles, more reproducible experiments, and reduced storage‑related expenses, especially when working with massive language‑model corpora that can span petabytes.

Under the hood, DaaS is not a monolithic platform but a modular assembly of existing, battle‑tested services. CSC’s Fairdata‑Metax supplies a robust metadata warehouse, while Fairdata‑Etsin provides the searchable interface. Object storage is delivered by LUMI‑O, and access control is managed through REMS, with IT4I’s LEXIS handling cross‑system orchestration. This plug‑and‑play architecture minimizes development risk, cuts capital outlay, and enables rapid scaling as demand for new datasets grows.

The early catalog already showcases the Open Web Search Index, a continuously refreshed repository exceeding one petabyte and containing more than a thousand distinct datasets. Such breadth offers a ready‑made foundation for search‑engine research, analytics, and large‑language‑model training without the need for independent web crawling. As automation matures and additional datasets are onboarded, DaaS is poised to become a cornerstone of Europe’s AI infrastructure, driving faster innovation and lowering barriers for both data providers and AI practitioners.

LUMI AI Factory Launches Dataset-as-a-Service to Bring Data Closer to Compute

Comments

Want to join the conversation?