A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
-
Updated
Feb 18, 2026 - Python
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
A ComfyUI custom node integration for local multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Echo-TTS, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools
ComfyUI custom node for the VibeVoice TTS. Expressive, long-form, multi-speaker conversational audio
VibeVoiceFusion is a full-stack, multi-speaker voice generation web system featuring LoRA fine-tuning, batch generation, and VRAM optimization. Based on Microsoft's VibeVoice (AR + diffusion architecture)
A fully local and private Speech-To-Text app with cross-platform support, speaker diarization, Audio Notebook mode, LM Studio integration, and both longform and live transcription.
Audiobook creation tool supporting multiple TTS models (Qwen3-TTS, IndexTTS2, VibeVoice, Chatterbox, Fish S2-Pro, Higgs Audio V2, etc), focused on high-quality output. Plus player/reader web app and standalone server component.
Beautiful voice app: record or upload to train a voice, generate speech from text or files, save & download voices.
Archive of the official Microsoft VibeVoice repository (7B & 1.5B). Backup of the deleted source code for the open-source TTS models, including the removed 7B version. Try the VibeVoice online service
A Gradio-based demo for end-to-end vision-to-speech inference: Extract text or descriptions from images using Qwen2.5-VL-7B-Instruct, then convert to natural speech audio via Microsoft VibeVoice-Realtime-0.5B.
HOW TO RUN MICROSOFT VIBEVOICE LOCALLY
Create multi-voice podcasts with AI text-to-speech
🐟 Enhance communication with Fish Speech, a powerful multilingual Text-to-Speech system featuring speaker management, auto-transcription, and emotion control.
A suckless, high-performance CLI tool for audio transcription using Microsoft VibeVoice-ASR.
Simplified scripts for fine-tuning VibeVoice speech synthesis models with LoRA. Painless fine-tuning with reasonable defaults, supporting both local GPU and Google Colab workflows.
A ready-to-use Google Colab notebook for running the open-source VibeVoice TTS model from Microsoft, using the quantized Large Q8 variant (~12 GB VRAM) for multi-speaker long-form audio generation
🎙️ Enhance voice synthesis with ComfyUI-Qwen3-TTS, featuring advanced voice cloning, emotion-aware ASR, and unlimited multi-role dubbing.
Synthesize studio-quality AI voiceovers, clone voices in seconds, and translate subtitles in 200+ languages, all fully offline on your device.
🌊 Simplify configuration with VIBE, a readable, fast-format that eliminates complexity while enhancing clarity and structure in your development workflow.
Add a description, image, and links to the vibevoice topic page so that developers can more easily learn about it.
To associate your repository with the vibevoice topic, visit your repo's landing page and select "manage topics."