Total Freedom! How to Generate Audio Locally

We teach you how we are generating music and speech entirely on a local machine using open source models in ComfyUI, no cloud subscriptions to ElevenLabs or Suno required. You'll see how ACE-Step 1.5 produces full pop songs from a text prompt and how Qwen3-TTS clones voices from a short audio clip, all on consumer-grade hardware.


Chapters

00:00 - Intro and What We're Covering
00:56 - Making Music Locally with ACE-Step 1.5
02:47 - Setting Up the Workflow in ComfyUI
04:40 - Prompting for Songs: Descriptions, Lyrics, and Settings
10:22 - Generating an Instrumental EDM Track with Gemini
12:43 - Local Speech Generation and Voice Cloning with Qwen3-TTS
18:18 - Deepfake Concerns and Wrap Up

Sponsors
Querio → querio.ai
Devil Wears Product (Merch Store) - https://devilwearsproduct.shop

Links
ACE-Step 1.5 (GitHub) - https://github.com/ace-step/ACE-Step-1.5
ACE-Step 1.5 (Hugging Face) - https://huggingface.co/ACE-Step/Ace-Step1.5
Qwen3-TTS (GitHub) - https://github.com/QwenLM/Qwen3-TTS
ComfyUI-Qwen-TTS (ComfyUI Nodes) - https://github.com/flybirdxx/ComfyUI-Qwen-TTS
ComfyUI - https://www.comfy.org/
ElevenLabs - https://elevenlabs.io
Suno - https://suno.com
Google Gemini - https://gemini.google.com

Find Us
YouTube - https://www.youtube.com/@PandCpodcast
Bluesky - https://bsky.app/profile/pandcpodcast.bsky.social
X - https://x.com/_pandcpodcast
Instagram - https://www.instagram.com/_pandcpodcast
LinkedIn - https://www.linkedin.com/company/p-and-c-podcast
© 2025 Prompt and Circumstance