News•Mar 2, 2026
Evals Skills for Coding Agents
Hamel Husain released evals‑skills, an open‑source plugin that equips AI coding agents with a toolbox for product‑specific evaluation. The package introduces an eval‑audit skill that inspects six diagnostic areas of an evaluation pipeline and a suite of targeted skills for error analysis, synthetic data generation, judge prompt creation, evaluator validation, RAG assessment, and review‑interface building. It is built to complement existing MCP servers from vendors such as Braintrust, LangSmith, and Phoenix, allowing agents to both run experiments and interpret outcomes. By providing these reusable components, developers can accelerate reliable AI product monitoring and extend the framework with custom, domain‑specific skills.