Do this to close loops with your AI agents

David (00:00)
I'm trying to think something interesting to say and that's like this is this is great. It's it's it's educational.

David (00:04)
Hey everybody, welcome to Prompt and Circumstance. My name's David. And today we're going to walk you through how to get OpenClaw to create self-healing products.

Ilan (00:08)
and I'm Ilan

David (00:29)
So a couple weeks ago, Ilan you were talking about improving on error handling for the agent that we're working on. how did you do that? Could you walk us through it?

Ilan (00:38)
Yeah, absolutely. It's a good question, David. So for just for the audience, for anyone listening, basically we have this openclaw agent who's monitoring our product for errors And that's great and all but we really wanted to close the loop on this.

And generally, all of you need to be thinking about how do I close a loop with AI agents? Because it's so easy to spin up new loops that get open where your agent's like, hey, here's a thing. Now I'm telling you. Okay, and then what? Right. And the whole point of an agentic system is that it needs to finish the task. Or at least finish the task until the point where as far as you possibly trust it. So if you're doing the work yourself after the agent tells you something.

then that's a good hint that you should be thinking about how might I close this loop? So here's where we started and here's the loop that we had.

What happened was we have a product, it runs on top of N8N, and N8N has really detailed logs of every run that happens. And so we had a scheduled error scan that would go into N8N.

And pull the latest logs across all of our workflows and detect any errors and flag any critical errors,

and then it would alert us. For us, it's in Telegram, but this could be wherever OpenClaw functions. And then I would see that that error, and I'd investigate, and then go manually fix whatever issue it was, and then the issue's gone. So this is already pretty helpful. It meant that.

We caught a number of issues before our customers ever saw them. And we were able to fix them and you know, customer experience never degraded as far as the user was concerned.

David (02:22)
Yeah, this can certainly apply, you know, to like enterprise situations too, right? so maybe if you're not ready for fully automated agents, you can still have this sort of human in the loop aspect to it.

Ilan (02:35)
Absolutely. But this got me thinking, why am I monitoring this flow? I mean OpenClaw is running on top of an LLM. LLMs can write code. So why is it that I'm having to figure out what happened on the other side?

David (02:48)
This is this is nothing

nothing to do with like the ex exasperation that that you you'd you'd have seeing yet another error happening, right?

Ilan (02:55)
yet another error. What are you talking about, David? They basically never happen. Our product is perfect.

David (02:59)
Ha ha

Ilan (03:00)
Alright, so this gets us to the recommended fix flow. And this is where we were until a couple of days ago. So on top of the scheduled error scan, when an error was found, then the agent would actually go through a diagnosis step itself and then provide a brief that a fix was ready. Essentially, hey, here's what I would do if I were you to fix this issue.

And then I would take that and implement it myself. This is a pretty good human in the loop workflow because it left me to actually implement the fix, which meant that I would notice if something happened. And I'm still the one making the key decision.

So this shows the first kind of progress to closing a loop with AI, which is first step you have the AI notice something is happening and let you know. Second step is notice something is happening and come up with a possible solution and then let you know. And that brings us to the current state.

Now, what we have is a monitoring flow. Where, same kind of deal, except for that the first check of the logs that runs on a much cheaper model. So in OpenClaw, you can set up your agents to run on a certain model on a schedule. And so the first scheduled model uses GPT 5.4 mini,

Because all it's doing is checking some logs and flagging if there are patterns of errors. Then it does the flagging and storing that. And then a second model runs, a second agent runs based off of the file that the first one generated. And it's a stronger model, uses GPT-5.5, it diagnoses the problem, comes up with a fix, and then it stores its plan for a fix.

including all the code that it was gonna write to fix that problem. And then it sends a note to Telegram saying this fix plan is ready for approval with a summary of what it's suggesting to do. And then all I have to do is say approve or not.

David (04:59)
That's cool.

Well Sam Altman, thanks you for your service, for your your your your patronage. what didn't work with some of the other models?

Ilan (05:08)
It has less to do with what didn't work and more to do with where my subscriptions are. And I've mentioned this in the past, but the thing I have liked about OpenAI is that they let you use your monthly subscription to power your openclaw agent. And so I don't have to pay for a ton of extra usage. It mostly fits within my monthly plan.

David (05:31)
Makes sense. Yeah.

Ilan (05:33)
that said,

The cheap model can be reasonably small and maybe worth even testing a powerful local model or a really dumb Qwen3 small parameter model.

this is this is the current state of our openclaw monitoring agent.

And this is where we're gonna get to, which is a completely autonomous flow. The autonomous flow can come once you've seen that the fixes that are being proposed by that stronger model are consistently, let's say, always working. that is to say, at least they never degrade the product from the customer's perspective or from the user's perspective.

So even if they don't fix the error, you know, that's not great, but it's okay as long as the customer again just doesn't notice that there's a problem. And so this one would work the same way. the error scan would run on a cheaper model, right to its file, the stronger model would pick that up, make the recommendation of a fix, and then actually implement the fix, test the fix. apply it in a way that's guarded though. So with N8N, for example, what we have it do.

Is create a clone of the workflow that it wants to update, make the fix in a clone, and then test the clone through the data that previously caused the errors. And then once all the tests pass, then it can update the original or the production flow with the changes.

David (07:02)
imagine that the the testing step of that, you might want to use like a different agent or even maybe a different model. because it's sort of like asking somebody to check their own work. Of course it did a good job. I've noticed that happening even in very simple like n you know not not code based stuff but document based work for L L Ms. so that might be a thing to consider.

Ilan (07:14)
Yeah.

Yeah, I mean it's a good consideration. Right now that the way that it works is that there are separate agents spun up. They are using the same model, but they have different purposes. so the testing agent is independent of the implementation agent.

David (07:39)
And can you explicitly tell it that hey for this testing agent always use this model?

Ilan (07:46)
Yes, you well yes, you can give it instructions that it should always spin off a testing agent and that agent should always use a certain model. Yep.

David (07:55)
Yeah, very cool.

Ilan (07:56)
And then the closed loop at the end of this flow is a notification that, hey, I found an error. Here's how I fixed it. Here, here's the proof that it's now working and it's in production.

David (08:09)
Got it. Yeah, this makes sense. you know, I I like the remark about yeah it might not fix the error, but if the customer doesn't notice, it's fine. It's like if if an error is thrown and the customer isn't there to experience it, was there an error at all?

Ilan (08:18)
Mm-hmm.

I mean, at the end of the day, we want our software products to be bug free, but the reality is as you make changes, as you make updates, bugs are introduced, especially when you have external dependencies like you're calling other models from providers or you know, you're using other third party tools. So it's just the reality that

Issues are gonna happen and the main thing that you wanna control for is making sure that the the customer has an excellent experience as much as possible.

David (08:54)
Yeah, absolutely. That's what you want to optimize for.

Ilan (08:56)
So that's that. That's our error monitoring flow and that's how we we're closing the loop now and where we're going in terms of closing the loop in our product.

David (09:05)
Yeah, thanks for walking us through that, Ilan. you know, I'm I'm thinking how cool it would be to see that in like a diagram where you kind of step up.

Ilan (09:12)
we're in the you know, the monkey with the spear right now, but soon we'll be standing up, and then we'll be hunched over the

David (09:20)
We hunched over. Yes, exactly.

Awesome. that was very educational for me and I hope it was very useful for everybody else watching this. stay tuned for more. You know, we're diving really deep into AI agents and you know it's not just because it's a craze, it's because that is making a difference for us and the people who we are working with. so stay tuned and we're gonna tell you a lot more about this.

Ilan (09:43)
Absolutely. Thanks so much for listening. Catch you next time.

David (09:45)
See you next time.

© 2025 Prompt and Circumstance