Teaching Local Models to Call Tools Like Claude

Teaching Local Models to Call Tools Like Claude

Tomasz Tunguz
Tomasz TunguzNov 13, 2025

Key Takeaways

  • Distilled tool-calling from Claude to GPT‑OSS 20B
  • SemDeDup and CaR selected high‑impact training examples
  • DSPy and GEPA iteratively optimized prompts for accuracy
  • Match rate rose from 12% to 93% after three phases
  • Claude’s own consistency caps practical tool‑calling reliability

Pulse Analysis

The rise of agentic AI systems has shifted focus from pure text generation to dynamic tool interaction. Knowledge distillation, long used to compress large language models, now finds a new frontier: teaching smaller models how to invoke external functions. By capturing Claude’s decision‑making during tool calls, researchers created a curated dataset that mirrors expert behavior, addressing the scarcity of high‑quality supervision for function calling.

The team combined semantic deduplication (SemDeDup) and clustering‑retrieve (CaR) to prune redundant examples, ensuring each training instance contributed maximal signal. Prompt‑engineering pipelines DSPy and GEPA then iteratively refined the instruction set, testing mutations and selecting the most effective prompts. Across three phases—initial DSPy bootstrapping, GEPA expansion, and a final curated sweep—the local GPT‑OSS 20B model’s alignment with Claude’s tool‑call chain surged from a modest 12% to an impressive 93% match rate, demonstrating the power of systematic data curation and automated prompt evolution.

While the results showcase a viable path for open‑source LLMs to emulate proprietary tool‑calling capabilities, they also expose practical limits. Claude itself exhibits roughly 50% consistency, meaning a high match rate does not guarantee absolute accuracy. Nonetheless, reducing dependence on expensive APIs opens opportunities for enterprises to deploy autonomous agents on‑premise or at edge, lowering operational costs and enhancing data privacy. Future work will need to address non‑determinism, expand tool vocabularies, and integrate real‑time feedback loops to solidify distillation as a standard technique for scalable, trustworthy AI agents.

Teaching Local Models to Call Tools Like Claude

Comments

Want to join the conversation?