Video: [AMA] Debugging Deep Agents | Duration: 2984s | Summary: [AMA] Debugging Deep Agents | Chapters: Welcome & Introductions (51.605s), Deep Agents Introduction (164.63s), Interactive Session Guide (269.145s), Deep Agent Evolution (386.27s), Building Deep Agents (1350.66s), Managing Deep Agents (1921.815s), Session Conclusion (2691.42s)
Transcript for "[AMA] Debugging Deep Agents":
Hey, everyone. Hello, everyone. Let's see. Let's give it a minute or so for a few more people to trickle in, and then we will get started. Alright. We're at the three minute mark, and I see quite a few people trickling in. I think we can go ahead and get started. So, Lynn, do you wanna throw up the Yep. presentation? Awesome. So thanks everyone for joining. This is our second AMA session, and we have a really fun one for you today, deep agents. So with that, next slide, we'll go over some quick introductions. We'll go into the actual deep agent content. So we'll break this out into two different parts. We'll do more or less educational. So what are deep agents? What's all the hype about? What's the value add of using a deep agent into my framework and my production workflows? Best practice for developing deep agents, etcetera. And then we'll go into an actual live demo so you can see deep agents in action. Following the demo, we'll dive into the ask me anything portion, and then we'll close out with some insights into our upcoming events. Next slide. So I guess first things first, introductions. So who am I? I'm Carlos. I'm a member of the account management team here at Lanechain. I'm responsible for commercial accounts on the West Coast and the APAC region. So if my name or face isn't familiar to you, you've more or likely have partnered with my counterpart on the East Coast, Lauren Quilty. So if you've worked with Lauren, she and I, we're we're cuffing the same same cloth. A little about myself, so I joined Lanechain late last year. Prior to joining, I have experience in a myriad of technology disciplines ranging from AI on the machine learning and deep learning and data science side of the house to infrastructure as code, secrets management, tokenization, observability in APM, data center design and standardization. I've worked in fintech solutions architect, master data management. I've touched quite a bit, but I'm the more boring person of today's dynamic duo. So, Liam, I'll let you intro yourself. Cool. Hey, everyone. I'm Liam. I'm a deployed engineer at Linkchain. I mainly work with our customers to help them get onboarded and get them through their POC and set up with Linkchain. Before I was a deployed edge, I wasn't always a deployed edge here at Linkchain. I started around nine months ago as a customer engineer where I built a lot of agents and deep agents for our go to market team. Awesome. So, before we kick things off, a little housekeeping. If you'd like to ask a question, on the right hand side of your screen, you will see an interactive window with three different tabs, chat, docs, and q and a. So if you'd like to ask a question throughout the session, feel free to drop it in the chat tab. That'll keep it a little more interactive. You can see what people post questions people post. You can actually respond to them or provide insights as to how you attacked or addressed a particular issue or concern. Docs, that tab just contains some relevant links to some reference material that is, you know, respective to today's topic. And then q and a, if you like to stay a little more anonymous, you can share a question anonymously there. But during the actual ask me anything session, we will aggregate questions for both the chat and the the q and a tab. It's already been mentioned in the actual chat. So, yes, this session will be recorded and distributed. I think we're gonna post this out on our YouTube channel, once the cloud has its way with encoding and all the fun cloud stuff. And then share feedback. So following the session, we will kick out a link to, share feedback. You can provide feedback on topics you would like to see covered, overall format of the presentation, on the host. If you just wanna tell me I have a face for radio, by all means, share the feedback. We we welcome it. Next slide. So with that, I'll kick it over to Liam. Liam, what are all these deep agents I keep hearing about? Cool. Yeah. I guess before we get into deep agents and what they are today, I think it's important to go through the history of how we got here. So looking at the evolution of LingQIAN and LingQIANF, we started out building this node architecture, which was very deterministic. So you could build out any graph type that you would want. You would define LLMs or tools as nodes, connect these with edges. And then, you know, starting from your source node, you would go down multiple decision trees to get to your final output. Doing an abstraction layer on top of LangeGraph, we have a React agent. You can still use React agents today. It's still a common use case. And this is where you might see an input given to an LLM, and the LLM can cycle between its tools and reason. Hey. Should I call more tools and continue in this loop until it decides it's done? And it spits out the final answer. Abstracting on top of React agents, we now have deep agents. Deep agents are essentially a more sophisticated agent harness on top of our React agents. The way we think about agents today is it is LLM plus agent harness equals agent. And it's our more opinionated way to do a well, I guess, I could say, as we've seen agents evolve over time, we've given them more tools and abilities to then call other agents and spin up sub agents of themselves and call tools or create a plan and follow that plan to as it sees fit. So in deep agents, we've essentially taken our React agent and just given it more abilities to, one, spin up more agents, two, create a file system so it can write the files, edit files, and basically save memories that way, as well as create a plan for yourself and then follow that plan. So going more in-depth here, I talked about this agent harness, which is essentially this graph over here about what we wrap the LLM in to allow it to do any type of use case that you would think of. So we have our initial LLM here, and then we gave it the tools of a to do list file system. So, basically, when you can ask a question, it might come up with a plan first. And what it can do in this file is check off the plan as it goes through it. This helps to keep the agent on task to make sure it doesn't drift as it goes along. Additionally, we have a concept of sub agents. So what you might do is you might have your supervisor agent call sub agents, and this is really important mainly for context management. If you gave just one agent in, like, a React agent setup a bunch of tools and let it run for ten minutes, it would take up so many tokens. The agent might forget the plan or what's going on along the way. Subagents can be helpful because you get to offload this context to other LLMs, which will then synthesize, the information it finds and then return just a concise summary back to the supervisor agent so it can only use to deal with the context it needs before it can spit out its final answer. We've also built in a lot of middleware into our deep agents harness such as prompt caching, the virtual file system, specialized prompts. It's MCP compatible, and it has other guardrails, such as summarization where it's when the agent runs too long or the conversation goes too long. It will compact the state so that it can keep running. One thing about our deep agent's architecture is that it's model agnostic. So opposed to a lot of other frameworks out there, you can plug any model into this. So let's say Anthropic comes out with the new greatest model. You can use that model. Then if OpenAI comes out with the next best one the next month, you can plug that in here. It's entirely open source, and we've recently added a new feature to it where you can connect it to a sandbox. This is incredibly powerful because now the agents can run code themselves and execute it. For example, you might create this agent connect this agent up to a database, which can it can then create its own charts or whatever you want it to run within the sandbox and then send them out to you. So I guess a simple use case here of a re of a deep agent would be a research deep agent here. So we have our initial deep agent here. We gave them tools as well as three research sub agents or two research sub agents and one critique agent. So what this agent might do is you might give it a task. Hey. Research the best athletes for the twenty twenty six Winter Olympics here. So then it would come up with a plan on how to best go about doing this task, which might be, hey. I need to look up the snowboarders. I might need to look at the best skiers. I might need to look up the best all the gold medalists and all the countries with the most amount of medalists. So it might create all of these plans in its planning tool here and then go about executing them. So to do that, it might delegate four or five research agents in parallel with different tasks for each subagent. So then these subagents might go out and complete just their specific task and then report back to the research subagent here. Then this research sub agent might then call this critique tool to look at its final response before it spits out its final response. And once the critique agent leaves its feedback, it might rewrite the response and finally send out the answer there. Additionally, as this research sub agents are doing their research, they might write these specific memories to the file system. This might be important because, you know, let's say you wanna come back to this research in another thread, You could connect that next thread to a file system to have those memories persist over time. You might also have research agent one and research agent two write to the same file system, to the same document so they can kind of collaborate as they're doing research. So this allows you to do long term memory as you're storing more files along the way with your research. But, also, you can use local state for just a specific trace or message within that thread if you're not trying to persist multi thread memory. One last thing I would like to bring up here before I jump into my demo is DeepAgent skills. Just as how you can save memories within your DeepAgent file system, you can also save skills whether you wanna do that for one user or for all users. And we've seen skills really accelerate and improve the usage of these agents just because you might have a one or two pagers on how exactly you want to write code or how you want to go about research. So what the agent can do as it searches is it knows it has this list of skills here. And whenever a specific task comes up that relates to one of your file names here, then the agent will decide to open that file and read it. And this is more powerful than just putting all of these contexts from these files into your system prompt because that will dilute your system prompt and confuse your agent. So by allowing your agent to open these files only when they need, that supercharges the agent, whereas it only gets the context it needs when it actually needs it. So jumping off of that, I put together a little example here of how we go about creating deep agents. So coming in here, I can also share this out at the end of this video call here. But this is our DeepAgents one zero one notebook here. So all I did at first was import OpenAI API key here as well as use the Tablet API key. Tablet is a web search API that we use for agents. And then we just configured our model here and go ahead and run this. And then I can just create a deep agent. And to do this, this is super simple. From our deep agent package, we can import create deep agent. We can define our agent as a create deep agent, give it our model from above, which we're using GPT 5.4 nano, as well as a system prompt we provided here. And we can run it, and we can ask it a question as, what is deep agents in two sentences? And I'll run it again here. And we see DeepAgent are an AI system designed to accomplish tasks step by step with tool use, searching files, editing code, running subtasks rather than only answering questions. So that's just a basic use of creating a DeepAgent. This would be very similar to maybe a React agent if we gave it just one tool. But, really, what makes these DeepAgent special are the built in tools or the harness we built around it. So the built in tools that our DeepAgent has is write to do, read to dos, write file, refile, edit file, LS to see all of its files within the file system as well as a task tool to spin up its subagents. So here, we can create the DeepAgent again, and we can give a memory here. And we can just ask it. Since these tools are built in, we can just tell it to, hey. Plan before you act and save your work into files. So we haven't given it any tools here. But because these tools are baked into the deep agents package, we'll have them out of the box. So as you run this here, we should see that it comes up with a plan and then save stuff into the file system. Now one thing to think about with your use cases is, obviously, we have traditional React agents, which are just a simple tool in agent loop and a deep agent, which is planning the tools, the ability to spin up sub agents and write the files. React agents are typically much faster in terms of it doesn't need to come up with a plan and check off the boxes as it goes through red. So depending on your use case, if you want really fast q and a between your customers and your agent, I would stick with the typical React loop because those questions might take the agent five to ten seconds to respond, whereas deep agents, I think, are for longer running tasks that might take five, ten, twenty minutes to really do thorough research and get you the absolute correct answer and also deal with larger context windows. So here, we created a bunch of files under the guide directory here, but we could tell it to create these files wherever you want. We just specifically told it to create our files under this guide here. But the file system, you can do literally whatever you want with. And, also, it came up with this plan here. So those were the built in tools into our deep engine package, but you can also define all the tools that you want to add to your deep engine as well. So for this use case, I am doing research on the Winter Olympic Games. So we created a bunch of tools here. Search Olympic medals, search athlete history, search Olympic viewership stats, search sports history. And then what we did is we plugged these tools into our DeepAgent here. So we have our model, our tools, and our system prompt here. We then can run it, and we can see how The US performed in the recent Winter Olympics. And here, it went out, and it did research with all of its subagents and came out with the results here. So we can see the most watched event, Didn't list specific events for this variety. Milda Quetta with 23,500,000 viewers. Cool. Olympic history. So she won the most gold medals as the youngest Olympian. Cool. But what's nice about DeepAgents is this is a very targeted use case. And, for example, Claude Code is an example of a DeepAgent. But what's nice about our open source DeepAgent package is you can make this agent specific to any use case that you want. One specific DeepAgent that we have here at Linkchain is our go to market DeepAgent. What we did in our go to market deep agent is it has contacts on all of our customer accounts. So what we did is we created sub agents that would go to our Salesforce. We created sub agents that would go to our Gong calls. We created sub agents that would read our Slack messages with the customer. Created sub agents that would read the email history with the customer. That way, when I am prepping for a meeting, I can simply go to this agent and ask, hey. Who is this account? What are they trying to accomplish? What did we talk about in our last call? And, you know, do they have a renewal coming up? And that agent will spin up with sub agents to go do research. And in two minutes, I'll have all the account information I need, to do well in my next call. So that's just an example of an opinionated deep agent, not just your traditional coding deep agent that you might think of with Cloud Code. Here in this example, we're doing a deep agent on the Winter Olympics. So we just created tool calls in this one. We gave our deep deep agent access to these tool calls. But let's say we wanna do deeper research as, you know, we wanna restrict the content to the subagents. What we might do is we might create a stat research or subagent and give it a specific subset of the tools from above, as well as we might create an athlete profiler DeepAgent and give it access to one of the tools from above with their own custom system prompts. We might then take our main agent and give it these two sub agents as well as its specific system prompt and let it run. So here, it might call the two subagents. Those two subagents will then gather their respective information, pass it back to the orchestrator, and then the orchestrator will respond with this final response here with, for example, Sean White being one of the greatest winter limouses of all time and Chloe Kim as well with the half pipe. Coming down here, I'm talking about the file system again. So this is helpful for logging tool long tool outputs, multistep artifacts, and state that you wanna persist within different sessions. So, basically, whenever you come across across an artifact or amount of code or maybe, you know, you write code to save an image, you can take those artifacts and save them within the file system so that the next time you pull up a thread, whether you wanna scope these files to users or to everyone, you know, accessing your tools, you can save them to the file system so that any agent that can read the file system has access to these artifacts. So you wouldn't need to do anything special here. You would just need to tell it, hey. Write to the file system and using the baked in write file tool, it'll create that artifact for you in the file system. This is going over our long term memory here with our composite back end. So for to deal with the back end here, you can import these packages here. You can save these memories to local state, or you can spin up a sandbox or a Postgres storage to save these memories over time in terms of the infrastructure needed to actually, you know, work with the dev agent and store these memories. So once you do that, we're just using the local composite back end here, And this is how you would implement this, but there's more in the docs here. So what we're trying to accomplish here is in thread one, we might save the user preferences that, you know, I prefer concise bullet point answers. Remember this. And then in thread two, when we create a new thread, you know, we might say, how do I like my answers formatted? And then the agent should come back and say, hey. You like bullet point answers, which is here. And one way we're doing this at Linkchain, as I spoke about our go to market agent earlier, the go to market agent also has the ability to write emails. So what we have is whenever our AEs write emails, we can tell the agent, hey. I like bullet points in my emails. Keep them very concise is how I typically write emails. That way whenever any one of our respective AEs talks to the agent, the agent remembers that specific AE's preferences. So whenever you talk to your agent or your version of the agent, it knows exactly what you want over time. And then this is how you can stream them here, but this is in the docs. Additionally, this all automatically streams to Langsmith if you just set these environment variables here. And then this is our final deep agent here if you just run this all from scratch. And it'll go across and research all of these questions here and, you know, spin up their respective sub agents with the system prompt and finally up it down here. Oh, looks like we got a connection error here. But and then here's just a list of the full documentation that you can go to as you wanna create a DeepAgent yourself. With that, I'll conclude the technical presentation and leave it up to questions from here. I'm happy to answer any questions about anything DeepAgent related or anything agent building related. Let's see. We actually got a few questions. First one, in no particular order, how does the file system tool prevent cross user access? So I in specifically talking about how different users will scope it, within the package itself, we do have restrictions upon, like, how you could scope it per user, but it also depends on where you're storing it. Like, if you're just holding it in the agent state and someone has access to that thread and you don't scope that thread, they could get access to it. But you can add auth at the file system level to make sure that doesn't happen. Awesome. Next up, when the deep agent has access to a sandboxed back end, why does it still need a dedicated LS tool when it could just run that directly on the command line? It could totally do that. It's up to you for how, you know, you wanna prompt that. Obviously, it has its own ELS tool for, you know, keeping state of the file system. But depending on your file system there, you could also run the basic batch commands there to complete whatever file you want or create whatever file you want. By giving you access to the sandbox, it can you can, you know, broaden the range of this, allow it to do whatever you want with the terminal of the sandbox. Perfect. Oh, this is a multipart multipart question. So the stream and state bars in the ustream hook don't include timing data like what is displayed in Landsmith. What is the best way to pull this information to a React front end? I don't know if I understand that question. You should be seeing that data when you stream. This should because DeepAgents is built on top of Landgraf and React Agents are built on top of Landgraf, they should all be exactly the same. For example, you could build a DeepAgent with the granular LangGraph code. You can build your own custom configuration of DeepAgent. So everything there should be the same. K. What's your approach for evaluating deep agents, outcome only or trajectory checking? Yeah. So when you're building a deep agent, I would break this down into steps. So when I go about designing an agent and a deep agent specifically, I'll first think about what knowledge sources I need to pull from. So this is where I start with mainly my tools file. Like, I'll create a tools. Py file and work on creating my tools there to make sure I have access to all my knowledge sources, and what does my data look like when I call them via API. Then from there, I might design the individual sub agents to make sure that each of my individual sub agents are working, and I'll run evaluations on those sub agents. Then from there, I'll extract more and then finally create the deep agent, in which I'll slowly start to add sub agents and change my system prompt and how I want to go at planning so that each step, I'm confident in, hey. My tool calls are working. My sub agents are working. And then, you know, now my full dev agent should be working. And I would evaluate one on the sub agent level so that each sub agent is returning how you want. So you might trace in, create your basic dataset, create your offline evaluators, and run that loop there to look at how you make improvements to your subagent over time. And then I would do the same thing to the deep agent. And then what you said, I like what you said about trajectory. I think that's a very important evaluator for a deep agent. I would make sure it's calling the tools in the order that I want. I think it's also helpful to set up an annotation queue to look at this because the deep agent might be calling them in a more optimal order than you would think of in your trajectory LLM as a judge of prompt there. So as well as, you know, setting up a trajectory evaluator of what you think is correct, I would also do a lot of human annotation just to look at your deep agents, how is calling the tools in what order, and whether which, you know, sequence of tools is resulting in you getting the best answer. Perfect. Speaking of tools, are these additional tools like virtual file system and memory going to be available in the React agent flow? So you could do this. Like I said so Lang Graph makes React agents, and then we added more middleware to React agents to make a Deep Agent. So, basically, we added bells and whistles to a React agent. Because Deep Agent, if you look at the repository for it, it's a React agent with a bunch of built in middleware built in. So you can add the file system middleware to a React agent to accomplish the same task. Or, alternatively, if you really hate the file system middleware, you could create your own version of DeepAgent by taking a create agent and not adding the file system middleware to it. But this is just our custom configuration of what we think is best. How do we build deep agents on the web and store memory per user? So if you deploy with Landgraf, you'll get all that persistence built in or if you build with Landgraf deployments. If you deploy in Landgraf deployments, you'll get all of that built in. If not, you're going to have to set up your custom sandbox or which might be do dotana or Daytona or Modal. We're also releasing our own version of sandboxes, or, you know, you'd have to connect it to your own Postgres database just to save all these files and state over time. Nice. Oh, a best practice question. Best practice for how to set token usage limits per client, especially given deep agents can be long running and may hit usage caps while running. I don't believe you can set it at the token level. But what you can do is you can set the amount you can put a limit on the amount of tool calls and turns. So, for example, you can limit it to 10 tool calls or, like, 10 model calls to limit your tokens, but you can't say, hey. Add a million tokens. You have to stop unless you build some custom middleware that is aware of the token count along the way. But, currently, we typically recommend just doing it at the tool or model level, like, the amount of tool calls made and have a safe return at the end. So once you reach 10, don't automatically quit. Return what you have back to the supervisor agent, then let it respond. Another use case question. Is there a standard or built in way for displaying sub agent output in a UI? I'm looking into ways to render collapsible cards or pop ups for sub agents. Yeah. So if you go to our DeepAgent doc let's see. We have a section here for streaming, which will let you know how to stream sub agent progress, you know, the LM cools stream tool calls. And this is our how to guide on how to best stream a DeepAgent and display all of the steps along the way, within your React UI. So we have a stream SDK for this that should help you out. And FYI, that docs page that Liam just pulled up, that's also linked in the docs tab on the interactive window on the right hand side of your screen. Next question. For very long running agents, I. E. Three plus hours, are there any particular best practices you'd recommend? I think that's a loaded question. I think that my best practice for this would be to deploy on Lang Graph deployments because we specifically built it for long running agents like that so you won't have to deal with any of, like, the infra issues of, you know, a long running agent over three hours. It'll run on links with deployments, and you can connect whenever it's done, and it'll spit out the output for you. In terms of, you know, doing long run agents, I would also look a lot into observability for this just to see, you know, how they're performing over time. I think this is very use case specific about what your three hour agent is doing to make sure it's actually making progress along the way. So I would say you need to do a lot of evaluation to make sure you actually need a three hour agent. You know, could this be done more efficiently in one hour? Is this agent polling in the background? For example, like, it might be up and running, right, for three hours, but is it actually doing work for three hours and, you know, consuming tokens for three hours? I think that's a question there. So I guess my biggest concern with a long running agent is what infrastructure you have running behind it to make sure it's durable. And two, doing evals to make sure it's doing whatever you want and doesn't, you know, drift along the way. Speaking of Drift, how is state managed within deep agents? Would an async sub agent keep its state? So I think this depends on the back end you have here if you're running this locally or if you're deployed on links with deployments with that back end. So you might be saving this state in Postgres, you know, to your checkpoint as along the way. You might be saving these memories within a file system, within a sandbox if that's how you hooked it up, your DeepAgent. So I think there's multiple ways you might be saving state for a subagent. Typically, sub agents don't require that much state because they're kind of one and done. For example, the supervisor calls a sub agent to do do a task and return back to the main agent, which the main agent will save those memories over time. I think you could maybe call the sub agent via an API. If you have this deployed somewhere else, connect that sub agent to its own Postgres so that is saving memory there. I know that through our Langsbyte deployments it's also, like, entirely dependent on, like, what back end you have here. But in our Langsbyte deployments using Postgres, like, the sub agents will have a namespace for their memories in check pointers so that you can pause and resume from there. So using links with deployments in our Postgres there, your subagents can be paused and resume from their check pointers to resume memory. Other than that, like, it kinda depends if you're using a sandbox to save your memories with file system or you're using Postgres to save memories. Okay. Tangentals at that. How do we configure which skills my deep agents use? So for that, if this were live, I would ask you what are you trying to accomplish with your DeepAgent. So are you maybe you're saving skills per user. So, you know, you might scope skills with some type of auth as to, hey. These are Liam's skills, and Liam's saving these skills over time. And then you might also have some global skills that your agent also has access to depending on how let's see if we have, like, a doc here. I'm trying to show off, like, the all. But, basically, like, you can scope your skills to either the, you know, broad level for everyone or per user. So you might have some general skills that everyone's agent should be using, and then users might save their own preferences over time. So it just depends on how you wanna scope them and what the use case for your agent is as to, you know, like, what access to skills do you want your agent to have, and what are, like, the main use cases for your agent so it can read those relevant skills for it. I understand the structure of deep agents, but are the subagents within a deep agent also a deep agent? That's a great question, and the answer is it could be. So you can define a deep agent to be a subagent of a deep agent or the typically, they're React agents, but you can have a deep agent call a deep agent within itself. Can I remove the built in tools and middleware from DeepAgents? No. But you can just call a React agent and add your own middleware there. Can I pass custom context and metadata to a specific sub agent? For that, yes, you can if the agent if the question for that is, you know, do I wanna call a subagent with a smaller model that's more token efficient? Yes. You can. You could also potentially scope, like, the skills or, like, file system to the subagent level, I believe. Don't quote me on that. But I would also, in the system prompt for the deep agent, just tell it, hey. For this subagent, only pass it this context. Or, specifically, when you call this subagent, tell it specifically this. And then as for, like, the configuration, you can pass whatever model you want to it. What is the best way I'm sorry. You ahead. can continue. I'm just trying to show the graph that pie, which is just a create agent under the hood. Yeah. So a deep agent is just a create agent with our built in middleware here. So we basically imported all of these middlewares that are available if you go to, like, our middleware on our doc website. And then we just create a create agent and pass it all these things here. So that's what deep agent is under hood. It's just a create deep agent, with a bunch of bells and whistles that we think are best practices when creating agents. But you can continue, Carlos. Awesome. What's the best way to prevent repetitive tool calls or thinking loops? Yes. So, one, I think the best way to solve this without doing any work is use a smarter, more expensive model. But that's always not the best solution. Well, it's always the best solution, but it's not the most practical always. The next one is to do observability and evaluation. So this is coming back to Langsmith. So you wanna see how your model is performing, how your agent loop is going, seeing what tool calls it's calling. You could set up online evaluator for trajectory to make sure it's not calling the same tool twice. Or, for example, it shouldn't be searching the same question twice. So you could set up an evaluator for that and alert you on that. And then you can go back and change your prompt, change your models, and hope that evaluator goes up over time. And that's where it becomes a little bit of an art of the LLM. Does DeepAgents come with prebuilt capabilities, or do I still have to write them? Nope. DeepAgents comes with all of this out of the box. So if I come back here over to the docs, it comes with sub agents, human in the loop, memory, skills, all out of the box with its built in tools of LS and these as well as middleware such as, you know, compacting the context once it gets too long and some other middlewares as well, you know, such as, like, planning. Are DeepAgents free? I'll take that one. Yes. It is. It's an open source agent framework, and there is no cost for it. How is DeepAgents different from Clog code or OpenCode? It's not that different from Cloud Code other than, one, it's open source. And, two, you have the ability to go in and make it for whatever task you want. You're you're passing it the specific sub agents you wanna have. You're passing it the specific tools that you wanna have. DeepAgents is a harness around, you know, a model the same way ClaudeCode is a harness around Anthropic. And you can make deep agents do whatever you want. We just gave it all these bells and whistles that we think make a agent harness great, and you can make it for whatever use case you want. How long are your prompts usually with these production grade deep agents such as the lane chain GTM agent or the open SWE? Yeah. I think it depends, and then they grow over time as, you know, you try to optimize with your evaluators and guardrails. You know, I've seen some go up to, like, 2,000 lines, and then I've also seen somewhere around 500. My recommendation would be to start small and then expand it over time because, you know, you can't make it super long because then you'll see performance start to drop. So I would start small and just grow it over time and make sure your evaluations are going up over time. I think there's also a loaded question because it's entirely model dependent. For example, the Gemini models might not perform as well to your system prompt as anthropic models or vice versa. How to debug how to debug long running agents and see the activated steps, sub agents, loops in real time. Yes. So for this, I would go to Langsmith. I would go to my tracing project. Let me go to chat link chain, and I would click in here to see the trace view for all of my agents. So I might come in here. I might see the sub agents calling. I might see the tools it's calling and then set evaluators up here. So I really believe in this observability because without this, I would have no clue what's going on within my agent. So I think as your agent become longer running, these traces actually become more paramount. Without these traces, you would have no clue what's going on and seeing how they're drifting over time. And then what you can do is you can also set up LLMs to evaluate these traces or the sequence of these traces, so that you don't have to click into every one. And this kind of automates the process. So as you're getting thousands upon thousands of questions from your users, you can automatically have these LLMs detect and look at all of your traces as they're coming in to let you know how your agent is drifting over time, especially as you make different model changes, different prompt changes. It's really helpful to know how your agent is performing so you can out you, can Liam. forming opt, I quick. I I think you're sharing the wrong screen. We're we're seeing chat GPT. guess it's because I changed my browser. Let's see. Let me go back to the the workspace. Let me get to my screen. Now it's inception we see. There we go. just have to do the entire screen. Sorry about that, guys. So what I was talking about here is you can come in here to your trace view and see all the steps your model made along the way. So here, you might have guardrails. You know, it might be planning its things or planning its next steps and going along, executing them, calling sub agents. You can see the tool calls here for what tools you're calling in what order, you know, what the final response is here. And without this, you would have no clue what's going on within your agent. And then with all of these, as you get thousands of traces, you're not going to be able click into every single one to see how your agent is performing, especially with debates. You have you know, it could go sideways in many more ways just because leaving more up to the LLM and leaving more power in LLM to decide which way it should go in its research, which is why you could set up an LLM as a judge evaluator to then evaluate the steps that your LLM is making along the way. And each of these evaluators could. You can create a scoring rubric for them within your prompt here. And then, also, you can define a response format here and have it score the trajectory potentially of the LLM. And then with these scores, you can set up monitoring dashboards to see how your agent is performing over time. Then as you change out different system prompts and different model configurations, or maybe you added a different sub agent or a new tool call to a sub agent, you can see how your agent is performing here. How would you recommend checkpointing state if the agent can change the file system in parallel, I e, even if the agent state can be checkpointed, the file system in edit it may be changed. So I think this entirely depends on how you're setting up your agent. So are you doing, you know, one dedicated sandbox per thread? Is every thread that you have for all of your users talking to the same sandbox? Like, ideally, you recommend that you have each sandbox, you know, one sandbox per per user, and then the user can have multiple threads and connect to an existing one. So I would, you know, maybe in that sandbox, create a file system at the system level of the sandbox for each thread. So and then you can set up filters. So, like, for example, you know, you might create thread one two three four folder within the file system, and that'd be all the files created as the agent runs within that thread. And if you click back on that thread, the sandbox will reopen that specific file, and then the agent will have access to those files within the folder in the sandbox. And then, you know, if you create another thread, it will create a new folder at, like, the root level of the sandbox in which you could then add the files to that. And especially especially in that sandbox, you might also have at the root level a global folder, which, you know, you're saving memory across all threads in that sandbox. And then if you have a different user come in and you don't wanna scope them to that, you might, you know, preload in the system skills into that sandbox. But then you would have a completely separate instant for that sandbox for that agent to run within it. Alright. Two more questions. Can we con can we configure what context gets passed into a sub agent from the main agent's system prompts and files it has access to. Yes, but not definitively. And what I mean by that is you can tell it that, and you could eval on that to try to make sure that happens. But unless you're specifically writing code to scope the sub agents to a specific set of the files, you cannot guarantee that the sub agents won't access the files if it's all at the prompt level. You can tell it not to do something bad, but it still might eventually do something bad, which is why you could set up specific guardrails in place just to make sure that sub agent has no access to the file system, but you'd have the right custom code for that. Last question. Are there any downsides to staying model agnostic? Anthropic suggests their agent SDK is less flexible with other models because it's deeply optimized for their own. Do we lose meaningful advantages by not leaning into that specialization? I think the main downside is, you know, let's say someone builds an agent harness. For example, QuadCode is an agent harness specifically for the anthropic models. They've evaluated them. Oh, sorry about that. They've evaluated them and made sure, you know, they were efficient for those models there. And I think, you know, on our evaluation set, that might take it from, you know, 80 to 90% accuracy just because they made all those system prompt optimized for the anthropic models. But what could happen, you know, if Grock came out with the next best model that's even better than the most optimized version of Anthropic? Then if you switch over there, then you're you have to redo your whole agent harness. So what I think is nice about dAgents is you can pick the leading model of whatever you think is best and modify it at will, and you can optimize it yourself. You can build that agent harness since you're the one building it specifically around your model, specifically around your use case. For example, Cloud Code is optimized around the coding use case, but is it optimized around maybe your go to market use case. But with deep agents, you can run evaluators and make it your own. So that's what I'll say about that. Perfect. Alright. We're about ten minutes out. Let's go ahead and close out. Liam, if you could bring up the presentation once more. Yes. So I have a browser. There we go. So just to wrap up, additional resources. Again, these are all links that you guys may find helpful. These are also linked, again, in the interactive UI window, in the docs tab on the right hand side of your screen. Next, really quick regarding our events we have coming up. So Interrupt is our agent conference that's coming up in mid May. We'll have speakers. We'd love to have you join us. We would have speakers from industry leaders across various different, tech sectors such as Nvidia, Coinbase, Toyota, monday.com, just to name a few. So there's a QR code. Feel free to scan it and or, visit the linkedinterrupt.mainchain.com to sign up and come out and say hello. So with that, I think we'll wrap it. Thanks again for joining. We have some really good questions. Again, this session was recorded. The recording will be available shortly. Feel free to pass it along to any of your teammates that were unable to join us. Further reminder, these sessions are semimonthly. The next session is scheduled for April 15. We do alternate to support various time zones, so the next one will be at 8AM Pacific, eleven Eastern. And lastly, please don't forget to take a few seconds to provide feedback on today's session or a session you would like to see in the future. Thanks so much for joining and we'll see you at the next one.