Build Hour: AgentKit

OpenAI
OpenAIOct 29, 2025

Why It Matters

AgentKit dramatically lowers the technical barrier to building, deploying, and scaling specialized AI agents, enabling enterprises to accelerate automation initiatives and capture revenue gains faster.

Summary

All right. Hi everyone. Welcome to OpenAI Build Hours. I'm Tasha, product marketing manager on the platform team. Really excited to introduce our speakers for today. So, myself kicking things off, uh, Summer from our applied AI team on the startup side, and Henry who runs product for the platform team. Awesome. So, as a reminder, our goal here with build hours is to empower you builders with the best practices, tools, and AI expertise to scale your company, your products, and your vision with OpenAI's APIs and models. Um, you can see the schedule down here at the link below openai.com/buildours. Awesome. Our agenda for today. So, I will quickly go over agent kit, which we launched just a couple weeks ago at Devday. Um, then hand it off to Samarth for an agent kit demo. Henry will then run us through eval which really helps bring those agents to life and let us trust them at scale. Um if we have time we'll go over a couple real world examples and then definitely leaving time for Q&A at the end. So feel free to add your questions as we go through. Awesome. So let's do a quick snapshot of what agents was like um building with for the last really like several months or even year. Uh it used to be super complex. Uh orchestration was hard. You had to write it all in code. Uh if you wanted to update the version, it would sometimes introduce breaking changes. Uh if you wanted to connect tools securely, you had to write custom code to do so. Um and then running evals required you to manually extract data from one system into a separate eval platform, daisy chaining all of these separate systems together to make sure you could actually trust those agents at scale. Um prompt optimization was slow uh and manual. And then on top of all of that, you had to build UI to bring those agents to life. And that takes another several weeks or months to build. So basically it was in massive need of a huge upgrade which is what we're doing here. Um so with agent kit we hope that we've made some incremental improvements to how you can build agents. Um now workflows can be built visually with a visual workflow builder. It's versioned um so no intro no no breaking changes are introduced. Um there's an admin center uh called the connector registry where you can safely uh connect data and tools and we have built-in evals into the platform that even includes thirdparty model support. Um as Samarth will show us in a bit, there's an automated prompt optimization tool as well uh which makes it really easy to perfect those prompts uh automatically rather than trial and error yourself manually. Um and then finally we have chatkit which is a customizable UI. Cool. So bringing it all together, this is sort of the agent kit tech stack. At the bottom we have agent builder uh where you can choose which models to deploy the agents with. Connect tools um write and automate and optimize those prompts. Add guardrail so that the agents perform as you would expect them to even when they get um unexpected queries. uh deploy that to chatkit which you can host yourself or with open AAI and then optimize those agents at scale in the real world with real world data from real humans by observing uh and optimizing how they perform uh through our eval platform. Cool. So we're already seeing a bunch of startups and Fortune 500s and everything in between using agents to build a breath of use cases. Some of the more popular ones that we're seeing are things like customer support agents to triage and answer chatbased customer support tickets, sales assistants similar to the one that we'll actually demo today, um internal productivity tools like the ones that we use at OpenAI to help teams across the board um work smarter and faster and reduce duplicate work. Uh knowledge assistance and even doing research like document research or general research. And the screenshot here on the right is just a few uh templates that we have in the agent builder that show some of the major use cases that we're already powering. Okay, so um let's make this all real with a real world example. Uh a common a challenge that businesses face is driving and increasing revenue. Let's say that your sales team is too busy outbounding to prospects, building relationships, meeting with customers. We want to build a go-to market assistant to help save sales time and increase revenue. And with that, I will kick it over to Samar to show us how to do it. >> Great. One of the biggest questions that we get at OpenAI is how do we use OpenAI within OpenAI? Um, and hopefully this kind of rolls the curtain a little back so you can take a peek at how we actually build some of our goto market assistance. Um we'll cover a few different topics today like uh maybe the agents that are capable of uh doing data analysis, lead qualification as well as outbound email generation. Um so what I'll do here is move over and share. Great. So we're actually on our Atlas browser. Um feel free to download that. I had a fantastic time using it these past few weeks and um I think it saved me hours if not uh you know days worth of time doing some things sometimes and uh um I'm a big fan. Uh okay so we'll get started and when we get into the agent builder platform the first thing that we really see um is a start node and the agent node. Um you can think of the agent as the atomic particle within you know the workflow that you go in and construct and behind it is the agents SDK which actually powers the entirety of agent builder. Whenever we build these agent builder workflows, um it doesn't have to live within the OpenAI platform. Uh you can copy this code, host this on your own, and you might want to even, you know, take this beyond traditional chat applications and do things uh like being able to trigger these via web hooks. So for this example, um we have three agents in mind that we're looking to build out. the data analysis one where we'll pull from data bricks a lead qualification one where we'll scour the internet for additional details and outbound email generation um where we want to maybe qualify an email with things on a product or a marketing campaign that we're launching. Sound good? >> That sounds great. I'm on board. >> Okay, great. So, we'll get started by building our first agent here. Since we have uh three different types of use cases in mind for what we're actually trying to build, um what we want to do is use a very traditional architectural pattern using a triage agent. The way that we think about this is that agents are really good at doing specialized tasks. If we break down this question to um you know the proper sub agent, we might be able to get better responses. For this first agent, let's call this a question classifier. Typing is hard. a copy over the prompt that we've we've put in here. I'll just take a quick peek at what this looks like. And really what we're doing here is asking the model to qualify or classify a question as either a qualification, a data, or an email type of question. Really, the idea is that we can then route this query depending on what the model selected as what its output should be. And rather than having a traditional text output, what we want to do here is actually force the model to output in a schema that we recognize and can use for the rest of the workflow. So let's say let's call this variable that the out the the model will output in category and select the type as enum. What this means is the model will only output a selection uh from the list that we provide here. So um from my prompt I had the email agent, the data agent and the qualification agent. >> Great. >> And real quick uh how did you write the prompt? Did you write that all yourself or I know the importance of prompt and steering the agent. How did you come up with that? I think writing prompts is one of the most cumbersome things that we can do. Um I there's a lot of time spent spinning wheels on what actually matters when you're capturing that initial prompt. And I think um one of the most key ways that I write prompts myself is use chat GPT and GPT5 to be able to create my vzero of the prompts. Um, within agent builder itself, you can actually go in and, uh, edit the prompt or create prompts from scratch to be able to use as, uh, the bare bones for what you might, you know, spin on in the future for your agent workflows. Um, for now, we'll leave it as the one that we pasted in here, but we'll in the rest of this workflow, we'll take a peek at what using that actually looks like. Great. Um, so now that we've actually got got the output, um, agent builder actually allows us to make this very stateful. So for example I have a um a set state icon here. Sorry just again drag and dropping also can be difficult. Um so what we want to do here is take that output value from the previous stage and assign that to a new variable such that the rest of this workflow is able to reference it. Um we'll call this category again. Um and assign no default value for now. Um, using that same value, I can now conditionally branch to either the data analysis agent or the rest of my workflow to handle maybe additional steps I want to do prior to executing the email um or the data qualification use case or the customer qualification use case. Um, what we'll do here is drag this agent in and we'll set that the we'll set the conditional statement here to say um if the state category is equal to data. Let's see. Oh, it looks like I spelled it wrong. >> Debugging. Great. >> As you can see, there's helpful hints where we were actually able to see um what actually went wrong and be able to really quickly go back and debug that. So here in this case we want to see if it's a data you a data agent will route to that separate agent and if it's not we'll probably use um additional logic to go in and scour the internet for those um you know inbound leads that we want to qualify or an email that we want to write. Um let's stick with the data analysis agent for now and go over what it's like to actually go in and connect to external sources within agent builder and largely agents SDK. Um what I want to do here is actually instruct the model on how to use data bricks and create queries that it can use um in co in cohort with an MCP server. So what we've done here is uh added a tool for the model to be able to go and access this MCP server and query data bricks however it chooses fit. Um if my quer is really hard and might require you know joins data bricks and GPT5 would be able to use those together to then be able to create a concise query. Um, so since I've built my own server for now, um, I'll add it here. And let's call this I'll add my URL first. Um, I'll call this the datab bricks MCP server. Um, and what I'll do here is actually choose the authentication pattern. You can also select no authentication. Um, but for things that are protected resources or might with live within authenticated platforms, you might want to use something like a personal access token to go do that last mile of federation. So, in this case, I'll I'll I'll use a um a personal access token I created within my data bricks instance and hit create here. Let's give it a second to pull up the tools. And we can see that a fetch tool is actually submitted here. Um what this allows us to do is select a subset of the functions that are actually allowed to the MCP server um to really allow the model to not get overwhelmed with the choices of potential actions that it can take. So, I'll add that tool there. Oops. Um and I'll also um I'll go back. One thing I might have missed here is actually setting the model. What I want to do is make this really snappy. And so what I can do is choose a non-reasoning model there. But for this one, I really want the model to iterate on these queries and react to the way that the model or the the the results of the model were actually perceived to um the agent. And so uh what we'll do here is do a quick test query to make sure the piping works. So maybe I'll say um show me the top 10 accounts. That should be good enough. Um, and what we can see is the model actually stepping through the individual stages of this workflow. So in the beginning, you can see that it classified this question as a data question, saved that state and then routed. Um, we can see that when it reached that agent and decided to use that tool, it actually asked us for consent to be able to go and take that action. You can configure that logic on the front end to be able to handle how to actually show to the user, hey, the model actually wants to go and uh select an action there. Um, with MCP you're able to do both read and write actions. And we have a few of these MCP servers out of the box. Think like Gmail. Um, we have a ton more uh out of the box that you're able to connect to. >> SharePoint. Totally. Um, and so here we can see that the the model is actually, you know, thinking about how to construct that query. And we can see that we can see a response here. We didn't ask for the model to really format this result for us, but we can actually really quickly do that with this agent itself. by just asking the model to say um I would like the results to be in natural language and just by you know spinning on um the generate button within agent builder itself you're able to provide these inline changes depending on the results that you see in real time >> super cool >> cool u so the next thing I want to do is actually create another agent to do some of that research that we were mentioning that might be useful for something like generating an email or uh qualifying a lead. So, we'll call this the information gathering agent. Looks like it's stuck here. I might have to give it a quick refresh in a moment. See, platform's a bit buggy. Great. Um, cool. So we're at this information gathering agent and what we want to do is tell the model uh how to actually go and search the internet for the leads that we want. Particularly, we're looking for a subset of the information that might be publicly available for a company. So, think about like the company legal name, the number of employees they have, the company description, maybe their annual revenue, as well as their geography. Um, and what we want to do here again is use a structured output to define what our output should look like when the model goes and um searches the internet and we're able to then uh you know instruct the model in terms of the way that it should search for the uh across the internet. Great. Um what we want to do here is also change the output format for uh the schema that we want to enter. Maybe we want to put the the fields that we previously just showed into a structured output format. You can also add descriptions um in the in the properties, but for now we're going to leave those blank. Great. So now that when the model goes to this information gathering agent, it will hit this uh agent, search the internet, and output in the format that we're looking for. Cool. Um since we saved the state of the the query routing in the beginning, we can go ahead and reference this again um when we're when we're going to um route again via email or to the lead generation u and lead enhancement agent. So what we'll do here is set this equal to email and then otherwise we'll just route it to the other agent. >> Awesome. Yeah. And the sub aent architecture is great because it means that you get better quality results a bit faster than you would just using one general purpose agent which is helpful for actually having impact and helping the sales team be more productive. >> Um what we'll do here is paste in a prompt for this email agent. Um, but really the highlight for for this for the email agent is that we're looking to generate emails that are not just from...

Original Description

Introducing AgentKit—build, deploy, and optimize agentic workflows with a complete set of tools. This Build Hour demos how to design workflows visually and embed agentic UIs faster to create multi-step tool-calling agents.
Samarth Madduru (Solutions Engineering), Tasia Potasinski (Product Marketing), and Henry Scott-Green (Product, Platform) cover:
• Build with Agent Builder- a visual canvas based orchestration tool
• Deploy with ChatKit- an embeddable, customizable chat UI
• Optimize with new Evals capabilities- datasets, trace grading, auto-prompt optimization
• Real World Examples from startups to Fortune 500 companies like Ramp, Rippling, HubSpot, Carlyle, and Bain
• Live Q&A
👉 Sign up for upcoming live Build Hours: https://webinar.openai.com/buildhours/
00:00 Introduction
04:50 Agent Builder
21:27 ChatKit
24:53 Evals
35:17 Real World Examples

Comments

Want to join the conversation?

Loading comments...