Privacy First! Setting Up a Local Image Generation Model

David (00:00)
Okay, all right. So in all of 33 seconds, we've got this locally generated image of somebody wearing our merch. How about that?

Ilan (00:01)
There we go.

That's fantastic.

David (00:11)
Hey everybody, welcome to Prompt and Circumstance. My name's David.

Ilan (00:14)
and I'm Ilan.

On today's episode, we're gonna go through what it looks like to have a local media generation set up on your machine. We're gonna talk about a couple of the models that you can use. And then we're gonna go through the process of generating images and then editing images.

and all locally, all free and all open source.

Ilan (00:50)
This week's episode is brought to you by Querio Have you ever found that your data team is bogged down in ad hoc questions from users or that your product team can't quite get the answers that they're looking for out of the Querio's AI agents, it's on top of your data stack and allows you to ask natural language questions to get the answers that you need immediately.

Try today by going to querio.ai, let them know that David and Ilan sent you for two months free of their Explore product. That's a thousand dollar value.

Ilan (01:18)
All right, David, you have been playing around with local image generation, a way to generate images with AI without having to pay anything more than whatever your local hardware costs. So why don't you tell us more about that?

David (01:34)
Yeah, it ⁓ was surprisingly easy after I got over the fear of installing it and getting it all set up. It was pretty straightforward and ⁓ what I was able to do is create stock photos for our clothing line.

So for example here, hey look, here's somebody who's wearing this shirt that we've designed. And over here for this other shirt, hey, here's somebody else wearing that. ⁓ So none of these people

And I think it's ⁓ something that is really interesting to be able to do locally on consumer-grade hardware.

Ilan (02:16)
Yeah, these look hyper realistic. I would not at a glance say that these are AI generated. So yeah, how did you get this working?

David (02:25)
All right. So I used something called ComfyUI as the UI to help me orchestrate ⁓ all of these image generating models that are "open source". And I would say open source with quotes because you don't have the actual source code. You have the weights, right? So open weights versus open source. Yeah, there's a bit of a distinction there, ⁓ but nevertheless,

It's something that you can simply download and run locally on your machine.

so here I am inside of ComfyUI, and this is something that is running locally on my machine. And again, I'm not using anything special, just a consumer grade piece of hardware. ⁓ And ⁓ you might notice that this looks very similar to n8n.

⁓ where the idea is that there's a flow these different nodes that are connected to say, look, here's the prompt, the prompt flows into this collection. And if I open this collection, you'll see, hey, I'm loading this model, I'm loading some other components here, I'm making specifications as to what it is that I'm creating. ⁓

There's some other controls in here, like a negative prompt. And then there's some parameters here that I can tweak, such as the seed, for

Ilan (03:43)
So David, I'm not gonna lie man, this looks a little complicated, so convince me that this is simple.

David (03:50)
Yeah, so it's a little bit like opening the hood to a car and you'll see all these hoses and coupling and you know, all these different pieces. But at the end of the day, you only need to control the gas and the brake and the steering wheel, maybe a few buttons here and there. And that's really what this is about. You know, there's a lot of things, sure, that you can tinker with, but at the end of the day, really, it's the prompt that I would be working with and maybe just a few small parameters.

Ilan (04:19)
So when you load this into ComfyUI, are you getting that kind of pre-built flow or did you have to build that from scratch?

David (04:28)
That's a great question. And ⁓ the answer is I didn't need to build anything myself. As part of the ComfyUI community or documentation for each of the models that they have support for, there's documentation. And as part of that documentation, you can simply download what's called a workflow. So here,

In this documentation, you see that you can just download the JSON workflow file and you can simply drag that in to the ComfyUI and it'll load that up.

Ilan (05:04)
Okay, very cool. So that whole workflow that I said looked pretty complex is something that just gets wired up for you. Somebody has already pre-built it and you're loading it in and then you can, as you said, know, tweak with a few buttons or a few parameters, the things that you need, or you can just let it run as is.

David (05:24)
Yep, exactly. So for example, if I were to simply drag over here, you can see if I just drop one of the workflows into here, it just loads that up, it parses it correctly, and then all I need to do is probably just fill out the prompt.

Ilan (05:39)
Okay, cool.

David (05:40)
So why don't we bounce back here? So what I am starting off with here is using the model, Qwen-Image 2512 as in 2025 December. That's how recent these models are. And we can compare, let's say two different models, right? So there's this one by this Qwen model. And then there's this Z-Image-Turbo model that we can contrast here.

So here it's come with this default prompt, which why don't we go ahead and run that and see how that goes.

Ilan (06:15)
And what's this prompt generating? Like what's the, what's in the prompt there?

David (06:19)
The prompt is saying, urban alleyway at dusk, a tall statuesque high fashion model, striding elegantly mid-distant full body shots from an angular perspective. And there's a whole lot of other texts here that not only describes the composition and the content of the image, but also maybe some technical aspects of it,

Ilan (06:40)
Now, a lot of times the models that we might be used to using in the cloud, right, if it's ⁓ OpenAI's Image Generation or Nano Banana or Google's Imagen, you don't have to specify a lot of this detail. They kind of infer it based on your prompt. so a lot of us have gotten maybe a little lazy with our prompting for Image Generation.

How do you find these models work with lazy image prompting versus ⁓ these kind of hyper detailed?

David (07:13)
I would say that there's no difference. with the retail, commercially available image generating models, ⁓ you can be very ⁓ simple with your prompt and you can also be very precise and provide it with lengthy prompts. And it depends on the degree of control you want over the output.

Ilan (07:30)
Mm-hmm.

David (07:38)
And so it's the same thing here. We can certainly test that today where we run it against a very simple prompt versus a bit more controlled one. Yeah, it's really ⁓ about how much do you want to define the output.

Ilan (07:53)
Okay, cool.

Now I see that this does not work at Nano Banana speeds.

David (07:58)
That's a really good point. And I think that's an excellent point of contrast between Qwen Image 2512 and what we're going to see with Z-Image-Turbo. Turbo being the emphasis here.

Ilan (08:12)
Is there a minimum hardware requirement to run these? Like I have an M1 MacBook Air. Do you think I could run a local image generating model or does this require dedicated graphics card and kind of like a higher level graphics card?

David (08:29)
That's a good question. ⁓ You certainly cannot run it on like very light hardware. So these do have some VRAM requirements as in like video RAM, the RAM that sits on your video card. ⁓ And so what I've got here is a card with 16 gigabytes of VRAM, which is a little bit on the expensive side for, you know, retail hardware. It cost me about thousand dollars Canadian for the card alone.

⁓ So for your machine, ⁓ I would look at, know, on a per model basis, what are the requirements. Some of them are actually very light in requirements. And then some of the models, actually have what's called a quantized version of it, which basically it's what would have been a very large RAM requirement. They distill it down to something that has a lighter requirement.

Ilan (09:21)
And then you can make 8-bit Mario instead of the photorealistic model walking down the alleyway.

David (09:27)
Yes, the photo realistic Mario with all the skin, the pores and everything. exactly. So here we see that it took 242 seconds to generate the image. Let's hop over and see what we got.

Ilan (09:30)
You

David (09:40)
Okay, so here we are over here back in ComfyUI. We've got this preview image ⁓ view here, this node. And you can see it's not bad. It looks like it's sort of like a high fashion kind ⁓ of a photo. I can open this up and right click on this, go to open image. And here I can view the full image. This is approximately 2K resolution. that's what we got.

So not bad for something locally generated. Now what we can do is take the same prompt and hop over here to Z-Image-Turbo. Again, you can see a very similar flow here where there's a prompt, there's this package here that does the same thing. It loads the models and then there's some controls here which we don't need to care about. So I can just copy paste that same prompt into here and I can run it against this model.

And I think we're going to notice just how much quicker Z-Image-Turbo is. And you know, the funny thing about these two models is that both of them came out of ⁓ Alibaba's labs. So it's almost as if they have these competing teams to see who can make the best kind of

So there we go. In just a few seconds. How long was that? That was 34 seconds to generate this image. And I can open this up.

And you can see what we got.

Ilan (10:59)
Very similar and ⁓ still super high quality.

David (11:03)
Yeah, right. So there's Z-Image and there's Qwen-Image. So, you know, pretty close, pretty close.

Ilan (11:09)
Mm-hmm.

David (11:10)
All right.

Ilan (11:10)
Wow, this is really cool, David. I think that's amazing what you could generate quickly.

Have you found an advantage of using Qwen Image, you know, using almost eight times as much time and processing power to generate a very similar image? ⁓ Maybe it doesn't scream the value of that model, but is there something else that it's really good at that Z-Image maybe falls short on?

David (11:37)
I'm just scratching the surface. I suspect that ⁓ what it is better for is when people want to add more control on top of the model. It probably has more capabilities to do that. I think one of the sacrifices that the Z-Image team did with Turbo is that ⁓ it's a bit less

Ilan (11:47)
Hmm.

David (12:01)
However, in both cases, ⁓ you can use what are called LoRAs, low rank adaptations, which will basically modify ⁓ what the model is able to do. So think of it as, well, the model knows sort of everything about every kind of image, but the LoRA will just kind of remind it of something that is particularly important for your use case. So there'll be LoRAs that are like, ⁓ okay, this is great for making something look like a Polaroid photo. This is great for making something look like ⁓ a Japanese anime. This is great for something look

making watercolor images, watercolor painting simulations.

Ilan (12:34)
And is there a LoRA(x) that speaks for the trees?

David (12:36)
There just might be. Okay, let's move on to ⁓ image editing because that's also something that we can do locally.

Ilan (12:38)
Yeah

Ilan (12:46)
This week's episode is also brought to you by N8N.

Its beautiful UI allows any user to go in and experiment with building AI agents, but also create the types of workflows and evals that are required for production level agents.

Try today by visiting the link in our show notes and you'll get two weeks free.

David (13:05)
All right. So here I am ⁓ with this other model loaded, which is FLUX.2 [klein] 9B as in billion parameters distilled.

All right, so Flux 2 Klein is made by Black Forest Labs out of California, I believe. And so this is, ⁓ it is an image generating model. It's also a great image editing model. So let's give it a test.

Okay, so here we are back in ComfyUI and I have this workflow loaded again just off of the documentation from ComfyUI, where in here we are going to be using the Flux 2 model. And here I've got this image loaded, which might or might not be completely synthetic. Nevertheless, we have this fighter and why don't we see what we can modify about this image.

Ilan (13:58)
Now I assume with the Flux models you gotta get it going to 88 miles per hour before it'll make any edits.

David (14:04)
That's right, there's a capacitor in there that is really important to this.

Ilan (14:06)
Ha ha ha ha

David (14:10)
All right. So I've asked that to change her gi color to pink and set the background to a waterfall in a forest. Let's see how this does.

All right, so that took 37 seconds. Not bad. Probably a little bit slower than Nano Banana, but again, it's running locally and it's completely free. So here we are with this image that it generated. Let's open this up.

So, right, I made her gi pink and I put her into that kind of background where there's a waterfall and a forest. It's all right. It's all right. ⁓ So, you know, when it comes to modifying images, I think it does a decent job. So why don't we come back here to this workflow. And you might notice that ⁓ there's also this section down here where we can combine multiple images. Right. So, ⁓

What I did there was I only modified the images in text. Well, what I modified the images in text and another image.

Ilan (15:07)
I see where you're going here.

David (15:09)
So I can select these nodes, make them bypassed.

coming down over here. So now instead of using the flow up top, I'm going to use the flow down below.

All right. So what I've done is I've provided this image of a certain mask and claw of a certain fighter from the streets. And I provided this prompt here to say, make her wear the mask and claw. And that's it. So trying to test out what you described there, Ilan, where the prompt is a little bit simple. So let's give that a go.

Ilan (15:41)
Let's do it.

David (15:42)
All right, so that took only 14 seconds. And here we go. Here's the output. Let's open that up bigger.

Okay, so, well, we got the mask all right. The claws, not so much, not exactly the same claws. But I mean, the spirit, I think, ⁓ is in the same vein. I think that's a pretty decent job. If you look at the lighting on the mask, I think that pretty well matches the lighting in the, ⁓ I guess, the cage that she's in. ⁓ And it also captured the nature of the material too, sort of plasticky.

Ilan (16:15)
Absolutely.

right, David, so after that last example, I'm seeing where the path is to creating those stock images that we used in our website. So do you want to take us through that flow?

David (16:28)
Yeah, sure. So the first step is let's generate the generic image of somebody ⁓ ready to have clothing put onto them in terms of like our particular branded clothing.

All right, so let's give the prompt of a Latina female wearing a simple black tee and it's set in ⁓ a seaside setting. So this is using Z-Image-Turbo. So I think this will be pretty quick.

Okay, so that took all of 24 seconds, 25 seconds. And here we go. We have an image of somebody wearing a black tee ready to go for us to modify that black tee and ⁓ put one of our branded clothing on there. So I'm going to save this image and then re-upload it into the image modifying workflow.

All right, so here I am back in the image editing workflow with Flux 2 and I've ⁓ uploaded the image that we had just generated. And down here, we've got this ⁓ nice little tee that's available from our store for all the senior vibe architects out there. ⁓ And ⁓ you are one. Yeah, there you go. So let's generate this image.

Ilan (17:32)
one.

David (17:38)
So what I've done is my prompt is very simple. Make the person in image1 wear the tee provided in image2.

Okay, all right. So in all of 33 seconds, we've got this locally generated image of somebody wearing our merch. How about that?

Ilan (17:45)
There we go.

That's fantastic.

And you'll be seeing this soon in our store.

David (17:58)
That's right, Devil Wears

Product.

Ilan (17:59)
So David, I can see the value in this, but everything you showed here are things that you could do in sort of the traditional.

cloud-based tool. why don't you walk us through what are the advantages here of doing this locally?

David (18:11)
Sure. So first things first, ⁓ with the data that you provide to those ⁓ image editing and image generating models, depending on your license tier, and think most of us aren't at the enterprise level, you might be providing your likeness and other confidential data for training. So ⁓ here, when you're doing this locally, it's all local and so completely private.

Ilan (18:36)
Yeah, that's a huge advantage. The privacy aspect is something that I think a lot of people are concerned about with the main cloud providers and main LLM providers.

David (18:45)
Absolutely. And the other thing of course is also cost. I mean, you know, here we have the upfront cost of, you got to get a video card. But if, you know, some people happen to be gamers as well, you can, you're using that for that anyway. So, you know, the, the fact of the matter is that if you, you know, pay for the use of those image generating models, you know, that's some additional cost that you're saving there.

The additional thing is that I believe Nano Banana always puts a watermark on the image. I mean, that watermark could be just removed by another.

Ilan (19:14)
Mm-hmm.

David (19:19)
another AI image modifying tool, but nevertheless, it's a nuisance, depending on, again, your level of subscription. So that's another bonus here for generating it locally.

Ilan (19:31)
Cool, so you got no watermarks, you have access to the latest models before potentially they're available in these cloud providers, ⁓ you have privacy and you have cost. Those are some pretty big advantages to going down the local generation path.

David (19:48)
Yeah, that's right. And again, the setup is really straightforward. You know, we're not going to walk you through that. There's lots of tutorials on how to set up ComfyUI locally. So I would suggest for anybody interested to go ahead and do that.

Ilan (20:01)
Sounds great. All right. Well with that, that's all we got for you this week. Thank you so much for watching. You can give us a like and subscribe. If you know somebody who you think would enjoy this and maybe want to try this out themselves, then send them this episode. Really helps the podcast grow and ⁓ we really appreciate it.

David (20:19)
All right. Thanks for watching and listening everybody. Catch you the next one.

Ilan (20:22)
See you next time.

© 2025 Prompt and Circumstance