Prompt and Circumstance | Transcript: No Limits! How to Generate Video Locally for Free

March 24, 2026 • 16 Minutes

No Limits! How to Generate Video Locally for Free

David (00:00)
was enormous. Video work though, dude, if you think image generating is complex, video generating is another monster. Okay.

Ilan (00:02)
Hahahaha

So how do we make this simple for people?

David (00:13)
I was thinking of the same thing of like, look, you don't need to touch. You only need to touch these essential places. You don't have to worry about anything else. Yeah. Okay.

Ilan (00:54)
All right, on two previous episodes, we showed you how to generate images and audio locally. And so logically we had to get to video. So that's what we're gonna be showing in today's episode.

David (01:04)
And it's a lot more straightforward than you might think.

Ilan (01:06)
So David, at the end of the last episode, you showed a music video for the song that we had generated in that episode. So how'd you do that?

David (01:14)
Well, let me walk you through the steps. It's using ComfyUI.

Ilan (01:18)
I'm shocked. Let's see.

David (01:19)
Hahaha

Ilan (01:20)
This week's episode is brought to you by Querio Have you ever found that your data team is bogged down in ad hoc questions from users or that your product team can't quite get the answers that they're looking for out of the Querio's AI agents, it's on top of your data stack and allows you to ask natural language questions to get the answers that you need immediately.

Try today by going to querio.ai, let them know that David and Ilan sent you for two months free of their Explore product. That's a thousand dollar value.

David (01:48)
All right, well, first, I generated the images. so for those who aren't familiar with how to use ComfyUI to generate images, watch our previous episode. And for those who aren't familiar with using ComfyUI to generate music or sound, we also have another video on that, right? And so what I've done so far is I've got the image.

And I've got some audio generated where we have this music star who's singing about vibe coding. And again, it's just a very straightforward process. It looks intimidating. We're just modifying some settings here. So.

Ilan (02:25)
And all of

this again is just, it's in a file that you can download online that has a model and it has all of these pre-configured connections for you,

David (02:37)
Yeah, so the workflow is the one file that you would download and plug into ComfyUI. You just kind of drop it in actually once it's running and then it'll be good to go. Now what you're going to want to do is download the models as specified here and you'll see that on the other model that I'm going to show you. So these are links. So you can just go ahead and download these models via these links and then you put them

Ilan (02:47)
Okay.

David (03:06)
into your comfy UI directory as shown here.

Ilan (03:09)
and what model are you using today.

David (03:11)
And we are using, well, for the image editing, we're using a Qwen 2.5, specifically the Qwen 2511 And quantizations of that model that I'm using. So BF 16. And that's because my video card isn't all that great. It's not the kind of stuff that would be restricted for export.

Ilan (03:35)
That's a good point. So the quantized models, we talked about them in a previous episode, but those are where they sort of downscale the model a little bit so that it can run on lower performance video cards.

David (03:48)
Yeah, exactly.

you, sort of tried to make it smaller by removing all the things that aren't as impactful, ⁓ for your output. Yeah.

Ilan (03:55)
Got it.

Cool.

Alright, so what next?

David (03:58)
All right.

so I, had this, this image generated. And in fact, what we're going to do today is, ⁓ I'm going to take this image. I'm going to modify it so that it looks like she's in a, in a music studio, singing. Right. So I've already done that. And, ⁓ you can see here, I really didn't need to change any of the parameters, right? So,

The prompt is also very straightforward. I just simply say instead, she's in a recording studio singing a song. Close up shot.

And so if I were to open this node here, again, everything here, it's for the tinkerers. ⁓ This is where you would specify, okay, it's specifically the Qwen image edit model. ⁓ And this is where you would point it at the model that you have loaded in or downloaded into your system.

⁓ And likewise for the rest of these, you have your ⁓ VAE, your variational autoencoder. Again, just a technical term, just, hey, the VAE is the Qwen Image VAE that you download. Cool, just point at it, right? So once you have that set up, you don't need to touch anything else really. Everything else here is set nicely for you. ⁓ And so you just come back here.

Ilan (04:54)
Okay.

David (05:07)
give it a prompt, give it the image, and away you go. And we showed ⁓ this workflow, or one similar to it, last time, I believe, ⁓ where could prompt it to generate a new image, or you could give it an image to combine. And this is how I actually got her to be wearing our shirt.

Right? So originally I had generated an image of somebody who is just, hey, you know, a cowgirl in a wheat field. And then ⁓ I threw it into here to say, all right, and now this person is wearing this shirt and it's. That shirt is now available for purchase at devilwearsproduct.shop available today.

Ilan (05:39)
Where is that shirt available, David?

amazing.

David (05:47)
All right, so as you can see here, ⁓ I've bypassed these nodes because we don't need to modify anything else using an image. So it's just kind of going in straight direction here. And so I've already generated this image where she's in a studio and singing. The intention being that, hey, why don't we generate a video of her singing, say the last little bit or the chorus, let's say, of the song that we had generated.

Ilan (06:13)
Now David, I've noticed in the past that when you bypass nodes, you use a hotkey. What is that? Okay, good for folks to know.

David (06:19)
Yes, that's right. It's Ctrl B for bypass. So,

yeah. ⁓ So if you want to do it manually, you just right click on here for if your windows, I suppose, and then there's a bypass option right there.

Ilan (06:32)
Okay.

David (06:33)
Okay, so now that we have this image ready to go, we are going to use another workflow to turn that into a video.

Okay. So we are going to be using something called infinite talk. All right. And so infinite talk is not itself a model per se. ⁓ infinite talk. The, the heart of it is actually using a WAN and, a whole bunch of like WAN related ⁓ stuff.

and you know, normally, ⁓ in order to generate five seconds of video, it's very strenuous on a video card, like consumer grade video cards. And that's what makes infinite talk something special. ⁓ just because of the way that it generates the video, I think it does it sequentially. and, ⁓ it just manages memory nicely. It just, I don't know how does it. It just does.

where theoretically, you you can have very long videos. Like I've personally generated videos that are like two minutes long ⁓ using this. And again, I just have consumer grade software, hardware. So very powerful model.

Ilan (07:34)
Okay, so infinite talk

is a model that's specifically from WAN?

David (07:38)
I don't know it whether WAN made it but like it's based off of WAN Yeah.

Ilan (07:41)
Okay.

Okay,

got it.

David (07:45)
Okay, all right, so.

Ilan (07:47)
That was also the backup name for this podcast. No, Infinite Talk.

David (07:50)
WAN?

If it's talk.

That's right. if ever you're you're you don't like small talk. We've got infinite talk for you.

Ilan (08:00)
Mm-hmm. That's right.

David (08:03)
Okay. And, and again, we'll provide this, this workflow.

And again, the links are simply available here for you to download what it is that you need. Okay. And so what you do is you come here to pick out where it is exactly that you have your downloaded files. So here in the WAN Video LoRA Select, I just pointed this file that I have downloaded. Okay. So this is the image to video, I2V.

And you see that it's actually 480p. That's the that it And then, or video resolution, I should say. And then over here, this is heart of it, which is the WAN 2.1, or 2 underscore 1, infinite And then single Q4, don't worry about that, that's the And again, this is just available for

Right here. Okay. So the where you would, that's where you would download that. all right. Next. GG. Good game. All right. and then, so the next thing is your VAE. You what it's called? Your variational auto that's, that's exactly the term that you want to bring up at parties, ⁓ in case you're just tired of people talking to you.

Ilan (08:59)
GG indeed, GG indeed.

That's right. Also useful if you're younger than us and single and you really want to meet somebody, definitely talk about that when you start talking to a new person.

David (09:26)
Yes, exactly. All right, so same thing here. Just, you you download the model or the file, I should say, ⁓ and then you just specify it here. Just say, hey, this is where it is. This will automatically ⁓ pick it up if you put it into the VAE folder of ⁓ the models area. And that should be pretty straightforward for you.

Ilan (09:48)
Okay.

David (09:50)
Okay, and have this piece of it, which is clip model. And again, that's a technical term. Either way, download it, specify it here. Okay. And then over here, now we get to choose our input image. so this is the image that we had generated. And so I uploaded it here.

Okay, pretty straightforward there. Okay, now it's time for audio. All right, so I've got this little audio clip that sort of snipped out of the full song that we made. Here's how it sounds.

Okay. So it's a 30 second audio clip

Okay, so now that we have that audio clip uploaded, we're going to come down here and just a few more things to set up here. And again, not technical, you just got to download the file and then, you know, upload it here or specify it here. Okay. All right. So you can see here in the instructions, it'll tell you where exactly to put stuff. Okay. So here in, with this particular model, right, this is where you get it, download that, put it into

ComfyUI/models/diffusion_models. All right, done. And then just specify it. Okay. And then over here, what you would do is, ⁓ now this bottom node is not connected. So don't worry about that. Okay. It's really this one here that you would care about. ⁓ So here is the link. So it's wav2vec2 And ⁓ so you come to this, this huggingface repo, you download it, put it into ComfyUI/models/wav2vec2

and then you just choose it here. All right, and I think that's all of the files that you need to specify. ⁓ And then a bunch of other nodes that you don't need to care

We just have four more to go. Okay. All right. We got to specify the height and the width of our video. So if, ⁓ if this were like a portrait, right, then it would be the other way around, right? The width would be 480 and the height would be ⁓ 848, right? This again is only designed for 480p. So I would not go above this kind of resolution. There are ways to upscale.

Ilan (11:56)
Got it.

David (12:05)
video, is something we can talk about another time. We can have a whole talk about like upscaling, because I think that's really important for those who want to generate images and videos. Now, this parameter here is the maximum number of frames to generate for the video. Okay. And now why is it 750 that I've set it to? Well, let me tell you.

So over here, our, node, you'll see that there's a frames per second parameter, which again, we don't need to modify, just observe that we are operating at 25 frames per second. Okay, so now our audio clip is 30 seconds. So do the math, right? 30 times 25.

equals 750. All right, so that's how we arrive at that value.

Ilan (12:52)
Okay.

David (12:53)
Okay, and here's the last thing. This is the most difficult thing. You need to provide a prompt. You need to say something difficult like "A woman is singing." ⁓ Which is also great with ⁓ this model. You don't need to do anything fancy. It can be very straightforward. know, a man is talking, a woman is talking. Of course, you can add more to this prompt.

Ilan (12:58)
Hmm.

David (13:18)
So what can happen with the videos that this model generates, I found, is that the subject of the video tends to move their hands around a lot for whatever reason. Okay. They just get really, really nervous or something. And so if you have a character where, ⁓ let's say that they have like nail polish on, when those hands go out of frame, it's not going to remember.

Ilan (13:30)
You

David (13:45)
the color of the nail polish. So that person will be, yada, yada, yada, waving their hands. And then it'll be like red nail polish and then suddenly white nail polish and then purple nail polish. And it's like, okay, well, let's make that consistent. Right? So there are, there is more that you can add to prompt here just to make sure that things are consistent. But for our example here, we are going to keep it simple with just a woman is singing.

Ilan (14:08)
Okay.

David (14:08)
And then that's it. So you can see here this is highlighted green because this is currently running because I want to generate the video for us all to see in here.

Okay. And you know how I said that the video, if you want a portrait, you do it the other way around? Well, I was silly and it's a square. I didn't do it right, but whatever. Let's have a look at this video. It's looking pretty

Okay.

Ilan (14:43)
Big yawn at the end. She looks like my daughter.

David (14:45)
There we go, here we go again.

Not bad. Pretty good for for, you know, generating on consumer hardware, right?

Ilan (15:02)
Not bad.

Yeah,

is there a way, know, in the past you've shown me a couple of other examples that you've made I've noticed this, that the lips get kind of out of sync with the audio. Is that just a factor of the model or is it something that can be changed with some settings and maybe a little bit more time?

David (15:18)
Yeah.

I'm not aware of any settings to make it lip sync any better. I think that's just a limitation of the model. Yeah. So, so what I would recommend, ⁓ if you want to have like a single shot, is the generates multiple, times, just roll the dice multiple times. And you know, for the parts that work, because you're always, I think you're always going to have some pieces where the lip doesn't sync. ⁓ so then you just kind of, you know, cut it where it makes sense. Yeah.

Ilan (15:33)
Got it.

Mm-hmm.

between them.

Cool. Well, this was really cool, David. I really appreciate you showing this to us. I think that's fantastic. I mean, all on your, just your, video card that's on your computer.

David (16:02)
Yeah, yeah, any of you can do this.

Ilan (16:04)
Amazing well with that. Thank you very much for watching. We hope you enjoyed this and we'll see you next week

David (16:11)
See you next time.

Creators and Guests

Host

David Vuong

Host

Ilan Rotenberg