Skip to content

Ruiqi-Yan/Awesome-Full-Duplex-SDM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 

Repository files navigation

Awesome-FullDuplexSDM Awesome

A curated list of full-duplex spoken dialogue models.
Welcome to PR if you want to add some resources.

Models

Title Relevant Resources
X-Talk: On the Underestimated Potential of Modular Speech-to-Speech Dialogue System arXiv/Github/Demo
MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models arXiv
Seeduplex: Native Full-Duplex Speech LLM Release/Blog
FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection arXiv
JAL-Turn: Joint Acoustic-Linguistic Modeling for Real-Time and Robust Turn-Taking Detection in Full-Duplex Spoken Dialogue Systems arXiv
TurnGuide: Enhancing Meaningful Full Duplex Spoken Interactions via Dynamic Turn-Level Text-Speech Interleaving arXiv/Github/Demo
Qwen3.5-Omni Official Blog
Covo-Audio arXiv/Github/Huggingface
SoulX-Duplug: Plug-and-Play Streaming State Prediction Module for Realtime Full-Duplex Speech Conversation arXiv/Github/Demo
PHOENIX-VAD: STREAMING SEMANTIC ENDPOINT DETECTION FOR FULL-DUPLEX SPEECH INTERACTION arXiv
EASY TURN: INTEGRATING ACOUSTIC AND LINGUISTIC MODALITIES FOR ROBUST TURN-TAKING IN FULL-DUPLEX SPOKEN DIALOGUE SYSTEMS arXiv/Github/Demo
Fun-Audio-Chat arXiv/Github/Demo
FireRedChat: A Pluggable, Full-Duplex Voice Interaction System with Cascaded and Semi-Cascaded Implementations arXiv/Github/Demo
PERSONAPLEX: VOICE AND ROLE CONTROL FOR FULL DUPLEX CONVERSATIONAL SPEECH MODELS arXiv/Github/Demo
VITA: Towards Open-Source Interactive Omni Multimodal LLM arXiv/Github/Demo
A Full-duplex Speech Dialogue Scheme Based On Large Language Models arXiv
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM arXiv/Github
Moshi: a speech-text foundation model for real-time dialogue arXiv/Github
FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems arXiv
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction arXiv/Demo
SoulX-DuoVoice Unofficial Intro
SALMONN-omni: A Standalone Speech LLM without Codec Injection for Full-duplex Conversation arXiv/Github
CleanS2S: Single-file Framework for Proactive Speech-to-Speech Interaction arXiv/Github
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities arXiv/Github
Real-Time Textless Dialogue Generation arXiv/Github/Demo
Language Model Can Listen While Speaking arXiv/Demo
Parrot: Seamless Spoken Dialogue Interaction with Double-Channel Large Language Models Paper
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents arXiv/Demo
LLM-Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems arXiv
Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue Systems arXiv
Generative Spoken Dialogue Language Modeling arXiv/Github/Demo
Duplex Conversation in Outbound Agent System Paper
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation arXiv/Demo
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities arXiv/Github
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models arXiv/Github/HuggingFace

Benchmark

Title Relevant Resources
Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency arXiv/Github/Demo
Full-Duplex-Bench-v2: A Multi-Turn Evaluation Framework for Duplex Dialogue Systems with an Automated Examiner arXiv/Github
FULL-DUPLEX-BENCH V1.5: Evaluating Overlap Handling for Full-Duplex Speech Models arXiv/Github
Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities arXiv/Github
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models arXiv
FD-Bench: A Full-Duplex Benchmarking Pipeline Designed for Full Duplex Spoken Dialogue Systems arXiv/Github/Dataset
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics arXiv

About

A curated list of full-duplex spoken dialogue models & benchmarks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages