Show HN: Needle: We Distilled Gemini Tool Calling Into a 26M Model

Show HN: Needle: We Distilled Gemini Tool Calling Into a 26M Model

Hacker News
Hacker NewsMay 12, 2026

Companies Mentioned

Why It Matters

By delivering high‑quality tool‑calling in a 26 M model, Needle makes advanced AI accessible on phones, wearables, and other edge devices, lowering deployment costs and fostering broader innovation.

Key Takeaways

  • Open‑source 26 M model matches Gemini 3.1 tool‑calling
  • Distilled on 200 B tokens; fine‑tuned on 2 B tokens
  • Runs locally at 6k toks/sec pre‑fill, 1.2k decode
  • Outperforms FunctionGemma‑270 M and Qwen‑0.6 B on single‑shot calls
  • Weights, dataset, and training scripts fully public

Pulse Analysis

Needle’s emergence signals a shift toward ultra‑compact, high‑performing language models that can be run on everyday devices. By distilling Gemini 3.1—a flagship multimodal model—into a 26‑million‑parameter Simple Attention Network, the developers have demonstrated that sophisticated tool‑calling does not require hundreds of millions of parameters. The training pipeline, which consumed 200 billion tokens on a 16‑TPU v6e cluster in just over a day, followed by a rapid 45‑minute fine‑tuning on a curated function‑call dataset, showcases how modern compute efficiencies can produce production‑ready models at a fraction of traditional costs.

From a business perspective, Needle opens new avenues for companies seeking to embed AI directly into consumer hardware such as smartphones, smartwatches, and AR glasses. Its local inference speed—6,000 tokens per second for pre‑fill and 1,200 tokens per second for decoding—means real‑time responses without reliance on cloud APIs, reducing latency, bandwidth expenses, and data‑privacy concerns. The open‑source release of both weights and the synthetic dataset empowers developers to customize the model for niche tools, accelerating time‑to‑market for specialized AI assistants in sectors ranging from finance to healthcare.

While Needle outperforms larger models like FunctionGemma‑270 M and Qwen‑0.6 B on single‑shot function‑call benchmarks, it still trails them in broader conversational contexts, highlighting the trade‑off between size and versatility. Nonetheless, its success validates the viability of “tiny AI” for targeted tasks and may spur a wave of similar distillation projects. As edge AI adoption grows, models that combine open accessibility, low compute footprints, and strong task‑specific performance will become critical assets for enterprises aiming to differentiate their products while keeping operational costs in check.

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

Comments

Want to join the conversation?

Loading comments...