Teaching Local Models to Call Tools Like Claude

•November 13, 2025

Tomasz Tunguz•Nov 13, 2025

Why It Matters

Achieving near‑Claude tool‑calling performance with an open‑source model reduces reliance on costly proprietary APIs and accelerates deployment of autonomous agents in resource‑constrained environments.

Key Takeaways

•Distilled tool-calling from Claude to GPT‑OSS 20B
•SemDeDup and CaR selected high‑impact training examples
•DSPy and GEPA iteratively optimized prompts for accuracy
•Match rate rose from 12% to 93% after three phases
•Claude’s own consistency caps practical tool‑calling reliability

Pulse Analysis

The rise of agentic AI systems has shifted focus from pure text generation to dynamic tool interaction. Knowledge distillation, long used to compress large language models, now finds a new frontier: teaching smaller models how to invoke external functions. By capturing Claude’s decision‑making during tool calls, researchers created a curated dataset that mirrors expert behavior, addressing the scarcity of high‑quality supervision for function calling.

The team combined semantic deduplication (SemDeDup) and clustering‑retrieve (CaR) to prune redundant examples, ensuring each training instance contributed maximal signal. Prompt‑engineering pipelines DSPy and GEPA then iteratively refined the instruction set, testing mutations and selecting the most effective prompts. Across three phases—initial DSPy bootstrapping, GEPA expansion, and a final curated sweep—the local GPT‑OSS 20B model’s alignment with Claude’s tool‑call chain surged from a modest 12% to an impressive 93% match rate, demonstrating the power of systematic data curation and automated prompt evolution.

While the results showcase a viable path for open‑source LLMs to emulate proprietary tool‑calling capabilities, they also expose practical limits. Claude itself exhibits roughly 50% consistency, meaning a high match rate does not guarantee absolute accuracy. Nonetheless, reducing dependence on expensive APIs opens opportunities for enterprises to deploy autonomous agents on‑premise or at edge, lowering operational costs and enhancing data privacy. Future work will need to address non‑determinism, expand tool vocabularies, and integrate real‑time feedback loops to solidify distillation as a standard technique for scalable, trustworthy AI agents.

SaaS Pulse

Teaching Local Models to Call Tools Like Claude

Why It Matters

Key Takeaways

Pulse Analysis

Ask Pulse AI: