RAG, Clearly Explained
Ilan (00:00)
That's what we're doing here. We're building. We're representing a very small part of the world in this vector database.
David (00:00)
Anyway, we're representing the world.
Yeah.
Ilan (00:09)
Cool.
David (00:09)
That was pretty good content.
Let's, let's, let's take that, that little snip.
Hey everybody. This is prompt in circumstance. My name is David. And today we're going to talk about RAG, retrieval augmented generation.
Ilan (00:16)
and I'm Ilan
David (00:35)
All right, so for today, we're gonna talk about RAG, but also about how you can set it up on N8n. We're gonna walk through not only the setup, but also the concepts that are involved, like a vector store. Some of us might have heard that terminology, and I think it's important for us to get a little bit better understanding as to what that is.
Ilan (00:55)
Absolutely.
This will be great for product professionals who wanna understand chat agents work a little bit better give them confidence in talking with business and engineering stakeholders in their companies.
David (01:09)
It's a good one.
Ilan (01:10)
This week's episode is brought to you by Querio Have you ever found that your data team is bogged down in ad hoc questions from users or that your product team can't quite get the answers that they're looking for out of the Querio's AI agents, it's on top of your data stack and allows you to ask natural language questions to get the answers that you need immediately.
Try today by going to querio.ai, let them know that David and Ilan sent you for two months free of their Explore product. That's a thousand dollar value.
David (01:38)
Okay. So we're here to talk about RAG to riches. And so this is a term that I think a lot of people have heard, but for those who aren't familiar with what RAG is, so it stands for retrieval augmented generation, maybe a little bit more context as to what that means.
Ilan (01:43)
You
Yeah, absolutely. So have you ever gone to your favorite LLM and asked it a question and it gives you an answer that is maybe factually true, but doesn't at all pull from the kinds of data sources that you're looking for. The reason that happens is that LLMs have access to all of this information across the entire web, across their training data, and they don't really know what's.
contextually important. And if you're building products with AI, it's important that if you want an LLM to search over documentation, that it's only using that documentation as its source. And RAG is the way that you achieve that. You basically tell an agent, hey, this is your data source that you should be using. And it allows it to search.
over those documents or whatever information that you've provided it. And tools like Notebook LM, which I'm sure a lot of people are familiar with, this is actually what they're doing. They're creating this RAG database.
David (03:01)
Yeah, that's a really good explanation. think some of us have also experienced with customer support where it's instead of talking to a human, just, hey, just ask this chat bot and it'll give you some answer. Now, previously it would just do like a really dumb search, but now it's a bit more intelligent in terms of its response. It'll bring together maybe several articles and form its answer that way.
Ilan (03:14)
Mm-hmm.
That's right. And this is important for product teams to understand because more and more of us are going to have these types of products built into our user experiences. And if you understand how they work, then,
You can speak confidently with the engineering teams and business teams who you need to work with to get this approved and actually get this built.
David (03:52)
All right. So, Ilan, you're going to walk us through what it's like to actually create a rag workflow. And some of the terms for our audience might be a little bit intimidating, but I think it's just because of the new terminology. So we're going to walk you through all of that and it'll be explained as we go along the way.
Ilan (04:10)
That's right. All right, let's get right into it.
All right, David, we're gonna use N8n to build this workflow, but there are three tools that we'll need.
outside of it to make it work. So I'm going to walk through what you need to have set up
And all of this will be provided in a document in the show notes for you to follow along.
So first, part of having a RAG workflow is having what's called a vector database. And a vector database is one of those terms we talked about earlier that may seem very techy and difficult to understand. But all it really is is a regular database that has a field that stores
information in a way that an LLM can easily access it and understand what's in a document. So we don't need to understand anything beyond that, but we are going to use a tool called SupaBase today and you'll need to create an account in SupaBase and that's it. Just have an account set up. It's totally free.
David (05:14)
I like that, I like that it's free.
Ilan (05:16)
The second thing we're going to need is what's called an embeddings model. An embeddings model, again, may sound complicated, but all it is is a specialized type of LLM that translates regular human readable text to this format that LLMs understand better for doing this rag work.
David (05:39)
Great.
Ilan (05:40)
For this, we're gonna use Cohere, Big Ups to a Canadian company, we're both big fans. And here you're going to create an account and then go to your API keys and you should see a free trial key.
available and that's all you'll need for today.
David (05:54)
Are there any limits on this free trial key?
Ilan (05:57)
Yeah, it is rate limited, you can't use it really for production workflows and you may find if you're trying to embed huge, huge documents that ⁓ you will hit those rate limits. But for the example I'm showing today, it won't be an issue.
David (06:16)
Great.
Ilan (06:16)
All right, lastly, we're using X as a data source for our workflow. If you wanna just copy the workflow that we're going to provide, then you'll also need a developer API key from X. And you can get that at console.x.com and you will need to fund it. So this is the only step in the process that is not completely free.
However, it is optional. You can choose whether you want to use X or some other data source. But if you want to follow along step by step with us, then you will need the API key that comes from X.
David (06:52)
And it's not that expensive in terms of testing this out, right?
Ilan (06:56)
if you fund it with a dollar or two, it'll be plenty for you to be able to test out this flow
here we are in N8n I'm going to start with this pre-built workflow.
And this flow is just what's grabbing documents for us. So this is the pre-work that's required in order to start the RAG workflow. And this will be provided to you absolutely free in a link in the show notes. So no need to build this yourself. But I'll give a quick explanation right now of what this does.
All right, so all we're doing in this flow is we're grabbing a list of Twitter accounts from different product management and AI influencers, and we're searching over their tweets for the last week. the starting point for this is a Google Sheet that has a list of accounts. So if you want to change this to be more tailored to your use case, all you need to do is change the list of accounts that are showing up in the sheet.
David (07:51)
Yeah, it looks really straightforward.
Ilan (07:52)
And where I mentioned that you need the X developer API key is in the search tweets node from N8n. So when you're importing this, you'll have to open this guy up and you will have to connect your credentials this node in order to get this to work.
So what we're gonna do here is we're gonna click the plus and add a code node and click code in JavaScript.
The first node we're going to add is called prepare for embedding. And all this is doing is grabbing the data that we got from Twitter, all the metadata about those tweets and setting it up in a format that will be easier to load it into the database.
David (08:35)
That's cool. Now is, is this something that you hand wrote yourself or did you like grab it from somewhere or did you ask AI to generate this?
Ilan (08:44)
Absolutely not. I did not write a single line of code to make this happen. I went to Claude, which is my preferred LLM of the moment, and Claude wrote this ⁓ for me and it worked right off the bat. But with that said, this is shared in the workflow that we're linking in the show notes. So you'll be able to grab this yourself and it will also be there if you import
this workflow into n8n of yourself.
David (09:10)
Okay.
Ilan (09:11)
So we're just going to connect that up to our workflow. And then we're going to start making the magic happen.
All right, the next thing we did was we added a Supabase vector store node. And this node, it allows you to connect your Supabase account and select a table. To create the table, you're gonna go into Supabase and you're gonna have to run a specific command. And again, that's shared.
in the workflow document that we will be providing. It seems difficult. All you have to do is copy paste what we put there, click on the right place in Supabase and paste it. I'll show you exactly how I did that right now.
In Supabase, you're gonna click on the SQL editor on the left-hand side.
and you're going to run this query, which we've provided to you. And this is going to create the table that we're going to store the documents in with that vector embedding that we talked about earlier.
I did not write this, you don't need to write this, this came straight out of an LLM. And again, it's just copy paste from what we shared with you.
David (10:23)
But by the way, just just a side conversation just cause anyway, I I just, I just want to nerd out for a little bit. how much do you know about like actually what's, what's happening with the, with, with the embeddings in the vector store?
Ilan (10:27)
Mm-hmm.
so it's tokenizing the document that you have. And then for each token, it's basically creating a similarity score across.
⁓ however many dimensions that you provided in your embeddings model. And then when you do a search, what the agent is doing is it's looking across, then trying to find any documents that have essentially numbers that are close.
David (10:50)
Mm-hmm.
Ilan (11:05)
closest to whatever the search terminology are, search tokens are.
David (11:10)
Yeah, that's, that's pretty close to my understanding as well. I guess, like, like a philosophical perspective where it's like, or maybe linguistic perspective where it's like, okay, ⁓ each of those, ⁓ 1024 dimensions represents maybe like, like an idea, right?
Ilan (11:16)
Mm-hmm.
David (11:26)
So it's like, okay, maybe one, one dimension has to do with, which product is it. Right. And, that's good. Like any number between zero and one, suppose, something like that. what it does is like each document has like, you know, some, some degree of belonging across these different properties of like, okay, this is like a technical document for this product for whatever. Right. And it talks about this thing.
And that's like the, it's embedded in that higher dimensional space to say that, look, if you're ever looking for stuff that's related that, you know, you're looking for this kind of a thing, that's where it is. And like you said, like the embeddings, it's like, okay, here's what those dimensions refer to.
or what a document is about, what a record is about. Yeah. Yeah. And, you know, the same thing happens.
Ilan (12:14)
or what a document, yeah. And it's actually what
the tokens in that document are about.
David (12:23)
Right. Yeah.
Ilan (12:23)
so
that you can find the right part of the document to...
David (12:28)
I see. Okay, did that. That's okay. I learned something. Yeah, that makes sense.
Ilan (12:34)
Yeah.
David (12:35)
It's such an interesting way of representing the world. Because that's basically what it is. It's a world representation. So that was interesting to me.
Ilan (12:39)
Mm-hmm.
Yeah,
That's what we're doing here. We're building. We're representing a very small part of the world in this vector database.
David (12:43)
Anyway, we're representing the world.
Yeah.
Ilan (12:52)
This week's episode is also brought to you by N8N.
Its beautiful UI allows any user to go in and experiment with building AI agents, but also create the types of workflows and evals that are required for production level agents.
Try today by visiting the link in our show notes and you'll get two weeks free.
Ilan (13:11)
All right, we're almost done here with the first part of our workflow. So we have our Supabase vector store and you see these two little prongs underneath it. One is for embedding and the other one is for document loader. the document loader is what breaks up the text that you're sending it.
into tokens and then the embeddings model is what converts those tokens into this vector representation that the LLM likes to be able to understand what it's about.
David (13:43)
Okay.
Ilan (13:44)
So when we click the embeddings node, you'll see the embeddings cohere model available among all the models. So that's what we're going to select. And when you click the document node, you'll see the default data loader is the only option available to So we're gonna add these two, connect them up.
And in the embeddings node, once add your credential for Cohere going to choose a model. And for our case, we chose the embed English Lite V 2.0 1024 dimensions.
Be very, very careful here. You must choose a model that has the same number of dimensions as what you added in Supabase. Otherwise, your results are gonna be bogus.
script we had you run in Supabase has a 1024 dimension vector field. So you have to choose a model with 1024 dimensions. If you want to change the number of dimensions, you have to change the table in Supabase.
And in the data loader,
We're going to choose this load specific data mode. And I mentioned this because this tripped me up for quite a while. I had this set at load all input data and I could not get this to work for the life of me. So note, you must have load specific data and all of these settings are going to be in the workflow that you can just import.
David (15:08)
You
Ilan (15:20)
Basically, what are we putting in here is the data we want to load the data we want to vectorize is this page content field that includes the tweet and then we have these options of adding metadata and metadata is great because that tells the LLM a little bit more
All right, so we're done with our workflow with that. So we're just going to hit execute workflow and we're going to see how this works. So you can see that we've looped over the items and there we go. We have created the vector storage of our data. That didn't take very long at all.
David (15:57)
Wow, that's awesome. How many tweets did we embed?
Ilan (16:00)
Good question. We embedded 10 tweets there.
David (16:02)
Okay, good enough for a proof of concept.
Ilan (16:05)
Absolutely.
David (16:06)
That's cool. So if, if our audience wanted to add more properties here, ⁓ they, they would need to add, I guess, more columns to that, that, ⁓ Google spreadsheet.
Ilan (16:21)
In our case, this all got prepped by that prepare for embeddings code node. And it grabbed it from the metadata from Twitter itself. So those responses that we get from Twitter, they have some metadata associated with them like,
who is the author, how many likes, how many replies, how many retweets, et cetera. And you can grab more data from the Twitter API.
David (16:47)
Okay, great.
Ilan (16:48)
So let's say you wanted to add another piece of metadata, for example, total engagement. Then you'd simply click add property, give it a name, and then drag and drop the field that you want into there. And that's it. Now this is part of the metadata next time you run this flow. That's right.
David (17:06)
Easy peasy.
All right. So, looks like this flow is setting up the data store. Are we going to be now talking to a chat bot, you know, to get information from that?
Ilan (17:20)
Great question, David. So I mentioned earlier, this is a two-part flow. And so that gets us to the second part.
right, here we are. So we're gonna build the agent that actually chats with this data. This is a separate workflow in N8n. So you're building two different workflows, one to store the data and one to chat with the data. And the heart of this is just the AI agent node from N8n.
You'll see there's a system message here. This is provided in the workflow that you can just import into n8n And you can go through it step by step to understand what we've instructed this agent to do. But there are some important steps here.
First off, like with any other AI agent, you need to give it a brain, and that's the chat model. So here we're using the OpenAI chat model. You can use whichever LLM you prefer. You could even use Cohere since you now have an API key for that. I chose to use the OpenAI one because I already have an account.
David (18:22)
Okay.
Ilan (18:22)
Second, we're going to add a simple memory node. This is a default that's available from N8n. It doesn't cost anything. And this is important because it will allow our agent to remember the conversation that it's having. So you can follow up with previous questions that you've asked.
David (18:41)
So you can have an actual conversation.
Ilan (18:44)
Exactly.
Third, we're gonna start adding tools for our agent. And the important one here is the Supabase VectorStore tool. And that's here, we've given it a name search tweet database. And importantly, you need to make sure that it's pulling from the same table.
where you save the information. Because if it's not pulling from the same table, it is not going to search through the documents that you've uploaded. And another important thing is toggle this include metadata on.
David (19:14)
And I guess that's, that's to give it more context, right? For ⁓ understanding, you know, what it is that I should be looking for.
Ilan (19:22)
Exactly. So all that metadata that we saved in the last workflow, it will now have access to that so that I can use that to understand the content.
We also have a description here. A description is required for this tool and this explains exactly what it will be looking for. This description can be generated by an LLM and it is part of the workflow that you can just upload into n8n that we're providing in the show notes.
So, very important here, this needs to use the same embedding model that you used to embed the data originally. So we're gonna connect this embeddings cohere model just like we used in the storage node.
And we're going to use the exact same model here, embed English light V 2.0 1024 dimensions. If you use a different model, you're going to get bogus results. So if you change models, it has a different way that it vectorizes that data, that it understands that data. And so.
the numbers that exist in the database will mean nothing to this agent.
David (20:33)
Okay, sounds critical.
Ilan (20:34)
All right, and then last step here, we're just gonna connect two tools. One is the Google Sheet, where we saved the raw data as well. This is helpful for the agent if you're just looking for basic data that it doesn't need to go into the tweets for. If you just wanna like, hey, give me the top five tweets from this week for retweets.
It doesn't need to use the vector database for that. And then lastly, we're going to connect the think tool. The think tool is basically a scratch pad for the agent to use during its process. We've talked about this in previous episodes, but AI agents have a very poor short-term memory. So they basically make a decision, execute on it. And then by the time they get the results, they forgotten why they made that decision. So the think tool allows it to write down why it did something
David (21:23)
Okay. I like the aspect of adding the sheet to it because I think it's a good reminder for everybody that sometimes you don't need to do something terribly fancy, right? Give it access to something that's very straightforward. So that's great that you have that there.
Ilan (21:42)
Totally.
All right, and now we're done. We're going to test this out. Let's see what happened. So, let's find out how are people using Cloud Code these days.
David (21:50)
I like that it's put together a plan first, if we look at the thinking there.
Ilan (21:53)
Mm-hmm.
Exactly. And so there we are. After maybe 20 seconds, we got an answer. It tells us tweets, how people are using it.
And so we see five different examples here. But now we can do something else,
So I'm just going to tell this to format this as a paragraph with sources cited afterwards.
David (22:12)
Let the record show that you did not say please.
Ilan (22:14)
I'll be first to go when the AI army takes over. And there, that just took about six seconds. And we learn here that people are increasingly using Claude code as a terminal native agent that can take on end-to-end work. There's all this other information and here we can see all of the sources with their links. there we have it. We've gotten an answer in a nice format.
David (22:17)
you
Ilan (22:39)
and we know exactly where this information came from.
David (22:43)
That's so cool. Yeah. And, ⁓ I noticed that, you know, it didn't have to go, ⁓ and, ⁓ look at the database again. It sort of just, ⁓ you know, figured that out from its previous response, thanks to everything that you've got set up there. That's awesome. I mean, you know, I, I think about this as a great way of understanding how, ⁓ rag workflows, are structurally.
It really helped us to understand this concept of vector stores and what's involved in that, at least at a level that is appropriate for product people, right? Like not necessarily for an ML engineer, but, you know, hey, look, for the rest of us, I think it's important for us to know what's going on. The other thing that I think about also is that...
You know, like n8n is something that can run locally. so for those who might not, who might be a bit more technically inclined or ambitious, perhaps you can set this up locally and then have that rag across all of your work documents that say not work documents, your personal documents. Right. And so that could be super powerful.
Ilan (23:43)
Yeah, I mean that workflow where we embedded documents into the Supabase vector store, the documents that you provide there can come from anywhere. You can reuse that flow. You can grab your local documents. You can grab articles that you find from perplexity, whatever you are thinking about. The world is your oyster
David (24:02)
Awesome. Well, hey, thanks for walking us through this, Ilan.
Ilan (24:04)
Yeah, absolutely. And for everyone out there, thank you so much for listening. We hope you enjoyed it. Are there other concepts that you'd like us to walk through in N8N or elsewhere on prompt and circumstance? Let us know in the comments. Otherwise, if you found this helpful, share it with somebody you know. It'll really help the podcast and we hope that it'll help them out too. you next week.
David (24:27)
See you next time.