By standardizing agentic inference, Open Responses reduces fragmentation and accelerates development of autonomous AI applications across the ecosystem.
The AI landscape is moving beyond turn‑based chatbots toward autonomous agents that reason, plan, and act over extended horizons. Existing Chat Completion endpoints were built for simple conversational exchanges and struggle to convey the richer, multi‑step interactions required by modern agents. Open Responses addresses this gap by offering an open, community‑driven standard that aligns with the emerging needs of agentic workflows, providing a common contract for developers and inference providers alike.
Technically, Open Responses builds on OpenAI’s March 2025 Responses API, adding open‑source accessibility and several key enhancements. It supports structured outputs—including text, images, JSON, and dedicated video tasks—while streaming reasoning as discrete semantic events rather than raw text deltas. The spec also formalizes sub‑agent loops: models can invoke internal or external tools, receive results, and continue reasoning within a single request, all governed by parameters such as max_tool_calls and tool_choice. This design simplifies integration for routers and model hosts, allowing them to adopt the standard with minimal code changes.
For the broader industry, the adoption of Open Responses promises reduced fragmentation, faster time‑to‑market for complex AI products, and a clearer path for interoperability among competing inference platforms. Developers gain a consistent API to build multi‑step AI assistants, while providers can differentiate through extensions without breaking compatibility. As the ecosystem coalesces around this open standard, we can expect accelerated innovation in autonomous agents, richer multimodal applications, and a more collaborative AI infrastructure.
Initiated by OpenAI, built by the open‑source AI community, and backed by the Hugging Face ecosystem
The era of the chatbot is long gone, and agents dominate inference workloads. Developers are shifting toward autonomous systems that reason, plan, and act over long‑time horizons. Despite this shift, much of the ecosystem still uses the Chat Completion format, which was designed for turn‑based conversations and falls short for agentic use cases. The Responses format was designed to address these limitations, but it is closed and not as widely adopted. The Chat Completion format remains the de‑facto standard despite the alternatives.
This mismatch between the agentic workflow requirements and entrenched interfaces motivates the need for an open inference standard. Over the coming months, we will collaborate with the community and inference providers to implement and adapt Open Responses to a shared format, practically capable of replacing chat completions.
Open Responses builds on the direction OpenAI has set with their Responses API launched in March 2025, which superseded the existing Completion and Assistants APIs with a consistent way to:
Generate Text, Images, and JSON‑structured outputs
Create Video content through a separate task‑based endpoint
Run agentic loops on the provider side, executing tool calls autonomously and returning the final result
Open Responses extends and open‑sources the Responses API, making it more accessible for builders and routing providers to interoperate and collaborate on shared interests.
Key points:
Stateless by default, supporting encrypted reasoning for providers that require it.
Standardized model‑configuration parameters.
Streaming is modeled as a series of semantic events, not raw text or object deltas.
Extensible via configurable parameters specific to certain model providers.
We’ll briefly explore the core changes that impact most community members. If you want to deep‑dive into the specification, check out the Open Responses documentation.
Client requests to Open Responses are similar to the existing Responses API. Below is a request to the Open Responses API using curl. We’re calling a proxy endpoint that routes to inference providers using the Open Responses schema.
curl https://evalstate-openresponses.hf.space/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $HF_TOKEN" \
-H "OpenResponses-Version: latest" \
-N \
-d '{
"model": "moonshotai/Kimi-K2-Thinking:nebius",
"input": "explain the theory of life"
}'
Clients that already support the Responses API can migrate to Open Responses with relatively little effort. The main changes are:
Migrating reasoning streams to use extendable “reasoning” chunks rather than “reasoning_text”.
Implementing richer state changes and payloads – for example, a hosted Code Interpreter can send a specific interpreting state to improve Agent/User observability.
For model providers, implementing the changes for Open Responses should be straightforward if they already adhere to the Responses API specification. For routers, there is now the opportunity to standardize on a consistent endpoint and support configuration options for customization where needed.
Over time, as providers continue to innovate, certain features will become standardized in the base specification.
In summary, migrating to Open Responses will make the inference experience more consistent and improve quality as undocumented extensions, interpretations, and workarounds of the legacy Completions API are normalized in Open Responses.
You can see how to stream reasoning chunks below.
{
"model": "moonshotai/Kimi-K2-Thinking:together",
"input": [
{
"type": "message",
"role": "user",
"content": "explain photosynthesis."
}
],
"stream": true
}
Here’s the difference between Open Response and Responses for reasoning deltas:
{
"delta": "heres what i'm thinking",
"sequence_number": 12,
"type": "response.reasoning.delta", // changed from response.reasoning_text.delta
"item_id": "msg_cbfb8a361f26c0ed0cb133b3c2387279b3d54149a262f3a7",
"output_index": 0,
"obfuscation": "0HG8OhAdaLQBg",
"content_index": 0
}
Open Responses distinguishes between Model Providers (those who provide inference) and Routers (intermediaries that orchestrate between multiple providers).
Clients can now specify a provider along with provider‑specific API options when making requests, allowing routers to orchestrate requests between upstream providers.
Open Responses natively supports two categories of tools:
Internal tools – hosted within the model provider’s system (e.g., OpenAI’s file search, Google Drive integration).
External tools – executed outside the provider’s system (e.g., client‑side functions or separate MCP servers).
The model calls, executes, and retrieves results entirely within the provider’s infrastructure, requiring no developer intervention.
Open Responses formalizes the agentic loop, typically a repeating cycle of reasoning, tool invocation, and response generation that enables models to autonomously complete multi‑step tasks.

image source: openresponses.org
The loop operates as follows:
The API receives a user request and samples from the model.
If the model emits a tool call, the API executes it (internally or externally).
Tool results are fed back to the model for continued reasoning.
The loop repeats until the model signals completion.
For internally‑hosted tools, the provider manages the entire loop—executing tools, returning results to the model, and streaming output. This means that multi‑step workflows like “search documents, summarize findings, then draft an email” use a single request.
Clients control loop behavior via max_tool_calls to cap iterations and tool_choice to constrain which tools are invocable:
{
"model": "zai-org/GLM-4.7",
"input": "Find Q3 sales data and email a summary to the team",
"tools": [...],
"max_tool_calls": 5,
"tool_choice": "auto"
}
The response contains all intermediate items: tool calls, results, reasoning.
Open Responses extends and improves the Responses API, providing richer and more detailed content definitions, compatibility, and deployment options. It also offers a standard way to execute sub‑agent loops during primary inference calls, unlocking powerful capabilities for AI applications. We look forward to working with the Open Responses team and the community at large on future development of the specification.

You can try Open Responses with Hugging Face Inference Providers today. An early‑access version is available on Hugging Face Spaces – try it with your client and the Open Responses Compliance Tool!
Comments
Want to join the conversation?
Loading comments...