New MIT Framework Uses Search to Handle LLM Errors in AI Agents

•February 6, 2026

EnterpriseAI (AIwire)•Feb 6, 2026

Companies Mentioned

LangChain

Why It Matters

By automating search‑based error recovery, EnCompass speeds development of reliable structured agents and lowers engineering overhead, accelerating adoption of LLMs in software‑intensive workflows.

Key Takeaways

•EnCompass cuts manual error‑handling code by 80%.
•Search strategies boost translation accuracy 15‑40%.
•Framework separates workflow from inference‑time strategy.
•Supports Beam Search, MCTS, or custom algorithms.
•Targets program‑in‑control agents, not fully LLM‑driven.

Pulse Analysis

The rapid integration of large language models into enterprise software has exposed a persistent reliability gap: LLMs can return inconsistent or outright incorrect responses, and a single mistake can derail an entire automated workflow. Traditionally, developers have patched this weakness with ad‑hoc retry loops, output voting, or custom backtracking logic, often inflating codebases to the size of the original agent. EnCompass, presented at NeurIPS 2025 by MIT CSAIL and Asari AI, reframes the problem as a search over execution paths, allowing the runtime to automatically explore alternative LLM outputs and recover from failures.

At the heart of EnCompass is a lightweight annotation system that marks ‘branchpoints’ in a Python‑defined workflow—places where an LLM call may produce divergent results. During execution the framework compiles the function into a searchable graph and applies algorithms such as Beam Search or Monte Carlo Tree Search, or any user‑supplied strategy, to sample and score possible paths. Because the search layer sits outside the core logic, developers can swap strategies without touching the underlying code. In a Java‑to‑Python translation benchmark the framework cut manual error‑handling code by roughly 80 % while lifting translation accuracy between 15 % and 40 %.

The modularity of EnCompass makes it a natural complement to existing LLM orchestration libraries like LangChain, especially for ‘program‑in‑control’ agents that follow a predefined sequence of subtasks. By abstracting inference‑time decision making, the framework lowers the barrier for teams to experiment with more aggressive search techniques, potentially shortening development cycles for data‑analysis pipelines, scientific simulations, and code‑generation tools. As enterprises seek to embed LLMs deeper into critical systems, tools that guarantee robustness without bloating code will be a decisive factor in scaling trustworthy AI solutions.

New MIT Framework Uses Search to Handle LLM Errors in AI Agents

When developers build AI agents that rely on large language models, they often face a tricky problem: The models can produce different outputs each time they are called, and some of those outputs are wrong. Recovering from those mistakes usually requires writing complex logic to retry steps or backtrack when something fails.

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and startup Asari AI have introduced a software framework designed to simplify that process. The framework, called EnCompass, allows developers to add systematic search and backtracking to AI agent programs without rewriting large portions of code. The work was presented at the recent NeurIPS 2025 conference and is described in a paper, “ENCOMPASS: Enhancing Agent Programming with Search Over Program Execution Paths.”

(Chaosamran_Studio/Shutterstock)

“Our goal is to develop an inference-time strategy framework: a framework that makes it easy to experiment with different inference-time strategies independently of the design and implementation of the underlying agent workflow. Such a framework is intended not to replace, but to be used in conjunction with LLM prompting and tool use frameworks, such as LangChain,” the authors wrote.

EnCompass targets what the researchers call “program in control” agents. In these systems, a developer defines the overall workflow in code, such as the sequence of steps an agent follows to translate software, analyze data, or generate hypotheses. The large language model is used only at specific points to perform subtasks, rather than deciding the entire workflow itself. For this use case, the main challenge is handling inaccuracies in LLM outputs, as a single incorrect response can derail the whole process. Developers often address this by manually adding code that retries calls, compares multiple outputs, or returns to earlier steps. According to the authors, this extra logic can be as large and complex as the original agent code.

EnCompass separates the agent’s workflow from the strategy used to explore different possible LLM outputs. Developers annotate parts of their code where an LLM call may produce variable results. These locations are called branchpoints. At runtime, EnCompass treats the agent’s execution as a search problem, exploring different execution paths that result from different LLM outputs. The framework enables backtracking over failed execution paths and can explore multiple execution paths in parallel, depending on the chosen search strategy. Developers can choose from common search strategies like Beam Search or Monte Carlo Tree Search, or define their own strategies, without changing the underlying workflow code.

EnCompass works by compiling a Python function that defines an agent’s workflow into a search space. Since each branchpoint represents a point where execution can diverge, a search algorithm can then sample and evaluate execution paths to score each one based on developer-defined criteria and return the highest scoring result.

(Credit: EnCompass Authors)

The researchers evaluated EnCompass on several agent tasks, including an agent that translates Java code repositories into Python. In that case study, adding search logic using EnCompass required about 80% fewer lines of code than implementing the same logic manually. The search enabled by EnCompass also improved translation accuracy by 15 to 40% when compared with a version of the agent that did not use search.

The authors say EnCompass is not designed for agents that are fully controlled by an LLM, where the model decides on each step. In those systems, there is no fixed workflow for EnCompass to compile into a search space. Instead, EnCompass is meant for developers and researchers building structured AI agents for tasks like code translation, automated analysis, or scientific workflows. By making search and backtracking a built-in runtime feature, the framework could make these agents more reliable and easier to experiment with as LLM-based systems are increasingly being used in software development.

In an MIT News article, co-author Armando Solar-Lezama, an MIT professor of EECS and CSAIL principal investigator, said, “As LLMs become a more integral part of everyday software, it becomes more important to understand how to efficiently build software that leverages their strengths and works around their limitations. EnCompass is an important step in that direction.” Access the full paper here.

The post New MIT Framework Uses Search to Handle LLM Errors in AI Agents appeared first on AIwire.

Read Original Article

Comments

Want to join the conversation?

Loading comments...

(Chaosamran_Studio/Shutterstock)

(Credit: EnCompass Authors)

The post New MIT Framework Uses Search to Handle LLM Errors in AI Agents appeared first on AIwire.

AI Pulse

New MIT Framework Uses Search to Handle LLM Errors in AI Agents

Companies Mentioned

Why It Matters

Key Takeaways

Pulse Analysis

Ask Pulse AI:

New MIT Framework Uses Search to Handle LLM Errors in AI Agents

Comments

AI Pulse

New MIT Framework Uses Search to Handle LLM Errors in AI Agents

Companies Mentioned

Why It Matters

Key Takeaways

Pulse Analysis

Ask Pulse AI:

New MIT Framework Uses Search to Handle LLM Errors in AI Agents

Comments