
Liquid AI Released LFM2.5-350M: A Compact 350M Parameter Model Trained on 28T Tokens with Scaled Reinforcement Learning
Companies Mentioned
Why It Matters
The model proves that intelligence density can outpace raw parameter counts, opening high‑performance generative AI to low‑power devices and expanding the market beyond data‑center‑only solutions.
Key Takeaways
- •350M model trained on 28 trillion tokens, high intelligence density.
- •Hybrid LIV + GQA architecture cuts KV cache memory dramatically.
- •32k context window runs under 81 MB on mobile GPUs.
- •Outperforms double‑size models on IFEval, GPQA, MMLU benchmarks.
- •40.4K tokens/sec throughput on single H100 GPU.
Pulse Analysis
The AI community has long accepted scaling laws that equate larger models with greater capability. Liquid AI’s LFM2.5-350M challenges that narrative by demonstrating that a compact model, when fed an unprecedented token volume, can achieve comparable or superior performance on targeted benchmarks. This shift underscores a growing focus on intelligence density—maximizing the utility of each parameter—rather than simply adding more layers, a trend that could reshape research priorities and funding allocations.
At the heart of LFM2.5-350M lies a hybrid backbone that replaces the traditional all‑attention Transformer with ten double‑gated LIV convolution blocks and six Grouped Query Attention modules. LIVs act like advanced RNNs, offering linear‑time processing and a constant‑state memory that slashes the quadratic KV‑cache growth typical of pure attention models. The result is a 32k token context window that fits within 81 MB on mobile GPUs and under 170 MB on NPUs, enabling real‑time inference on devices previously unable to host generative models.
From a market perspective, this edge‑first approach could democratize AI‑driven services, allowing enterprises to embed sophisticated instruction‑following agents directly into smartphones, IoT gateways, or on‑premise servers without relying on costly cloud APIs. Competitors may accelerate their own lightweight model programs, while cloud providers could see reduced bandwidth demand as more workloads shift to the edge. As the ecosystem embraces models like LFM2.5-350M, we can expect a surge in specialized AI applications—data extraction, tool use, and real‑time classification—delivered with lower latency and tighter privacy controls.
Liquid AI Released LFM2.5-350M: A Compact 350M Parameter Model Trained on 28T Tokens with Scaled Reinforcement Learning
Comments
Want to join the conversation?
Loading comments...