3 Important Lessons For Creating Production Grade Agents
David (00:00)
Hey everybody, this is Prompt and Circumstance. My name's David.
Ilan (00:03)
and I'm Ilan.
David (00:04)
And today we're going to go over three important things that you ought to know when going from POCs to production for your AI agents.
Ilan (00:26)
Alright David, I've got an interesting one for you today while we enjoy maybe the coldest winter in the history of our area, at least in recent history. So far. ⁓
David (00:39)
So far. The coldest winter so far.
Ilan (00:45)
And you know, in this winter time means you're spending a lot of time inside trying stuff out. I found a really cool tutorial for building a competitive analysis And so having gone through that tutorial and gone through some stumbling blocks myself, I learned some pretty important lessons on breaking down.
an agent into sub agents, improving your prompts. And I wanted to share those today.
David (01:14)
Sounds good. Sounds like that's something you can't do with a human. Can't break a human into subhumans.
Ilan (01:19)
That's right. ⁓ Well, not yet, not yet. We'll see what OpenAI has been quiet lately. We'll see what they come up with next year.
David (01:23)
Not yet.
Let's see what the future holds.
Ilan (01:31)
This week's episode is brought to you by Querio Have you ever found that your data team is bogged down in ad hoc questions from users or that your product team can't quite get the answers that they're looking for out of the Querio's AI agents, it's on top of your data stack and allows you to ask natural language questions to get the answers that you need immediately.
Try today by going to querio.ai, let them know that David and Ilan sent you for two months free of their Explore product. That's a thousand dollar value.
Ilan (01:59)
All right, David. So I've been playing around quite a lot in n8n. This is my preferred workflow and ⁓ AI automation tool, both for personal use and at work. right here, what we're seeing is a market research agent, which
came out of a tutorial done by Pavel Huryn from Product Compass. And we'll link that in the show notes. And it seemed pretty cool, right? It grabs your competitors, it does some thinking, it goes to Perplexity to do some research, and then compiles all of this and then sends you an email at the end. So it all seems well and good. However, in my experience, when I used this agent, what I found was that it missed
important competitor news. So for example, one of the key competitors from my job, it missed ⁓ a hundred million dollars Series C raised by that competitor ⁓ just a couple of days before.
David (02:43)
Hmm.
That's something that I
suspect a human certainly wouldn't miss.
Ilan (03:02)
That's right, that's right. this led me to go down a rabbit hole of how to improve the agent's performance. first thing that I did was actually just go through the tool
So this is a cool thing that you can do in n8n, which is when you have a session running, you can see what gets returned at each step, and you can analyze what the input and the response was from the agent.
So what I ended up finding is that the agent was giving kind of like superficial requests to Perplexity. So it's like, okay, I gotta get a little deeper and give it more specific types of news or types of information that I'm hoping to collect from my competitors.
So the first lesson here is actually not about splitting agents, but it's actually about analyzing your logs and trying to understand where your agent went wrong. And it takes a little bit of research and time, but it's really worthwhile because it gives you that level of ⁓ insight about what's going on.
David (04:15)
know, term that is mattering a lot these days, I'd say probably in the last six months or so is ⁓ observability and evals. And it sounds like this is right up that alley.
Ilan (04:29)
Yeah, absolutely. part of the reason that I really like n8n is that it has out of the box observability. And when we're talking about observability, it's really that level of understanding Hey, what, what happened during this LLM call? What got passed in? What got returned? What did the agent do with that information? What was its next step, et cetera. So you can really understand the step by step.
David (04:51)
Great.
Yeah, being able to inspect what otherwise would have been a multi-step process.
Ilan (04:59)
That's right. And I also want to be careful here. I think this is something that sounds like it could be really on the engineering side. Like, I just hand this off to my engineering team and it's their job to look through the logs, But
In the case of AI and agents, at least where we are today, I really think this falls into the realm of what the product team should be handling because this all falls into the domain of what we can control as product managers.
And we usually are the subject matter experts in what information should be returned from your agent. So an engineering team might not know that, I was expecting to see a fundraising announcement from this competitor.
David (05:42)
Mm-hmm.
Ilan (05:42)
So the first thing that I did was make a much more specific system
We're gonna share this whole workflow with you. And so you'll be able to see that system prompt in detail. But basically what the main,
change was, was instead of telling the agent, hey, you're a market research agent, come up with a good prompt for Perplexity to research competitors. Instead, we said, here are the types of information that we're looking for for competitors. So go to Perplexity and search for these types of information for each of the competitors.
David (06:19)
And is that
something that has been incorporated into the system prompt? Where either you say, is, okay, so it's like, here's the template that we're looking to fill.
Ilan (06:25)
That's right.
That's right, exactly. So we're telling it that we want official news, product and partnerships, industry coverage, et cetera. So
would expect that each one of these becomes a call to Perplexity to search for that specific information for that competitor.
David (06:43)
Got it.
Ilan (06:43)
And that's well and good, but you run into another problem.
you end up hitting maximums or limits in n8n your LLM. So in this case, we hit 10 iterations.
on the Anthropic Chat model, which is the default limit. You can increase that limit. You can make it 30 or a hundred, but then you run into a second problem, which is eventually your context grows and grows. So you make one call about one competitor to Perplexity that gets updated in the think tool. Then that context gets sent back to Anthropic and then it makes another call to Perplexity and it stacks up all of this context about your competitors.
David (07:16)
Mm-hmm.
Ilan (07:25)
and eventually you hit a token limit in the chat model. So for example, Anthropic has a 10,000 token per second limit. So even if you have a longer context window, that's the maximum number of tokens you can call in a single second, at least on the tier of user that I am.
David (07:28)
Mm-hmm.
I'd imagine that also dramatically increases your costs too, right? I mean, this is not a one-time thing. It's a continuous thing. so, you know, with every ⁓ incremental improvement that you can deploy onto here, it's going to scale significantly.
Ilan (08:03)
Yeah, it's a good point. To be honest, David, I wasn't really controlling for costs here, and was more focused on getting the result. The cost ⁓ costs whatever. I mean, it's a couple of dollars ⁓ a week. So I was okay with that, but I agree with you longer term that that's really a consideration that you have to take into account, right? How much are you actually spending on wasted tokens?
David (08:12)
Okay, Mr. Moneybags. That just... Oh yeah, cost whatever. It's fine.
Mm-hmm.
Ilan (08:32)
So the agent now has the ability to do all of the research that it needs to do and it provides the depth of you're needing, you're hitting limits in the tools that you're using. And of course you can switch out the model. So that was my first step was to switch to OpenAI instead and that gave me slightly higher limits, but you still end up
more risk of hallucination because suddenly it's got cross-context from different types of searches and has too many jobs to do.
David (09:05)
Mm-hmm.
Ilan (09:05)
This week's episode is also brought to you by N8N.
Its beautiful UI allows any user to go in and experiment with building AI agents, but also create the types of workflows and evals that are required for production level agents.
Try today by visiting the link in our show notes and you'll get two weeks free.
Ilan (09:29)
So then the next step logically was to separate into sub-agents. I have three things that I wanna achieve. want news about my competitor. wanna understand user sentiment. Are people complaining about them? Are people super happy about them? then I wanna synthesize all that information together and send off an email.
David (09:46)
And is this part of the template or this is where you started to create your own stuff here?
Ilan (09:51)
starting.
here, this was where I started to branch off into my own work.
David (09:57)
You forked his repo, you mean?
Ilan (09:59)
That's right.
And another step that I took here is to split out some of the work that is deterministic away from the agent. So yeah, we can create agents now that have a bunch of tools in front of them and can do a bunch of stuff with those tools. But.
In the case of a strict list of competitors that you want to research each time, there's no reason to make the LLM call that each time. Instead in your workflow, you can just get the competitors from Google Sheets and aggregate them and then pass that into your agent, right? Like let's take the deterministic step out of the agent's hand.
David (10:38)
Mm-hmm.
Yeah, that's a really good point. think that ⁓ in the frothy enthusiasm to deploy AI everywhere, people forget that the better route tends to be to do it in the classic way. And only then when it gets too complex for the classic method, should we then fall back to a more probabilistic method using LLMs.
Ilan (11:07)
That's right. Or another way to put it is where do you actually need in the process? And let's focus on making the agents do the steps that require thinking.
All right, so in my head, logically, I have two separate agents that I want to run simultaneously to do two different jobs at the same time. And then I want to synthesize their information and write an email. And this is what it looks like. Again, this will all be provided in the show notes. You can...
load this into your own n8n instance. But now we have a news researcher and this only has two jobs to do for each competitor And all it has to do is return a structured array
of its
David (11:58)
I noticed that there's a little ⁓ toggle there to say require specific output format. I suppose that's another way to go about that rather than doing it in the prompt.
Ilan (12:08)
Yeah, you can do that. What I found is that just specifying the output format in your prompt is sufficient because The next agent will just interpret whatever information it gets.
This is important if, again, you need deterministically for the next step in your process. It must be a JSON with a specific format. It does require an additional tool to be added, which is why I find that it's simpler to just specify it in your system prompt.
David (12:26)
Mm-hmm.
Got it.
Ilan (12:40)
Meanwhile, we have a sentiment analysis, so this is actually only performing one search, and it's just trying to understand what user complaints or frustrations have existed in the past week for your list of competitors.
So this seemed great. The problem we run into here is it turns out n8n doesn't run things in parallel. n8n is a sequential tool. Each workflow is meant to run sequentially. And the way that it determines the sequence is based on diagram. So it runs things left to right, top to bottom. So what ends up happening here is it ran
David (13:14)
Hmm.
Ilan (13:18)
here to my newsagent ran the synthesis then came back and ran the sentiment analysis and yes, you could adjust this but if you want to make sure that you always have The two subagents run before the synthesis it seemed like Too big of a risk to take
David (13:36)
Mm-hmm.
Ilan (13:37)
to keep this going in production.
David (13:39)
Yeah. what did you do about that?
Ilan (13:46)
right, so this is what I did about it, David. I sequenced the agents. If they're gonna run sequentially anyway, they're not gonna run in parallel, then what's the point of making a flow that looks like things are running in parallel? And so this is what it looks like at the end. This is actually running for me weekly now, looking for competitor news.
There's a couple other things that I did in here, but I'll pause for a second.
David (14:11)
I noticed that you changed from open AI to anthropic.
Ilan (14:15)
one of the benefits of building tools in an n8n is that you're not tied to a specific model. It takes about 10 seconds to switch out ⁓ Anthropic for OpenAI. I can even show you right now how you would do that.
You just remove the Anthropic model, choose a new chat model, go OpenAI instead, and then I already have my account, choose the...
model that I want. And there we go, we've switched out Anthropic for OpenAI. So easy way to each model performs for the tasks that you have.
David (14:49)
Yeah, yeah, that's you do need a subscription in either situation, right?
Ilan (14:55)
In all situations, you would need an API key from that model provider. So it's not a paid subscription. It's a ⁓ funded API.
So we have everything running in sequence here. ⁓ this runs really well. The other change that I made was I made the synthesizer, not an agent, but just an LLM call. Cause all it was doing was.
creating an email and then sending the email. And sending an email is again a deterministic step that can be taken. So all I really needed that step to do was to synthesize the data from the two agents previously.
David (15:30)
That's cool.
Ilan (15:31)
Now, this does run into a last problem, which is time. This takes quite a long time to run. In the example that I have here, this is just running on a list of three competitors, but if you have a list of like 30 or 40 different market players that you're trying to analyze week on week, ⁓ the tool takes a long time to run and you can still run into ⁓ limits on each of the subagents.
who are trying to do multiple calls.
David (16:01)
Hmm.
Ilan (16:02)
So preview for a future episode. There are tricks that you can use in n8n to get tool calls to run in parallel and also to loop over data so you're only passing one market player, one competitor at a time to subagents. We'll go over that another time, how to make that work.
David (16:21)
That's for a deeper deep dive on n8n.
Ilan (16:23)
That's right.
that'll be three important lessons for how to sub-agents to run in parallel.
David (16:29)
Awesome. Well, hey, thanks for walking us through this. I think, you know, the entire setup of all of this is fairly straightforward, right? It was all point and click. You didn't need to know any kind of code. You had a ⁓ starting foundation and it was just kind of tinkering with it from there.
Ilan (16:45)
Mm-hmm.
Yeah, absolutely. And tinkering is ⁓ an important skill to have to be able to get agents to work. So just to summarize the three key lessons here, where one, use your logs to observe what's going on in your agent if you're not getting the results that you're looking for.
The second one is if you're limits in your agents or your LLMs, or you're dissatisfied with the performance of your agent because it's trying to do too many things, then it's time to split into subagents. And the third one is really consider when you need things to run in parallel or not.
But at least in n8n, if your tools are gonna run sequentially, then you might as well make ⁓ your whole workflow sequential and accept that the time is gonna be a little longer for each call.
David (17:45)
Those are great lessons and ⁓ I think anybody who builds any kind of more sophisticated agent than just a little POC needs to take those into account.
Ilan (17:54)
All right, with that, that's all we've got for you this week. We hope you enjoyed it. Are you looking for other information or tutorials about agents or AI? Let us know in the comments. Otherwise, give us a like, subscribe, send this to a friend who might enjoy it, and we'll see you next week.
David (18:10)
See you next time.