| X-Talk: On the Underestimated Potential of Modular Speech-to-Speech Dialogue System |
arXiv/Github/Demo |
| MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models |
arXiv |
| Seeduplex: Native Full-Duplex Speech LLM |
Release/Blog |
| FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection |
arXiv |
| JAL-Turn: Joint Acoustic-Linguistic Modeling for Real-Time and Robust Turn-Taking Detection in Full-Duplex Spoken Dialogue Systems |
arXiv |
| TurnGuide: Enhancing Meaningful Full Duplex Spoken Interactions via Dynamic Turn-Level Text-Speech Interleaving |
arXiv/Github/Demo |
| Qwen3.5-Omni |
Official Blog |
| Covo-Audio |
arXiv/Github/Huggingface |
| SoulX-Duplug: Plug-and-Play Streaming State Prediction Module for Realtime Full-Duplex Speech Conversation |
arXiv/Github/Demo |
| PHOENIX-VAD: STREAMING SEMANTIC ENDPOINT DETECTION FOR FULL-DUPLEX SPEECH INTERACTION |
arXiv |
| EASY TURN: INTEGRATING ACOUSTIC AND LINGUISTIC MODALITIES FOR ROBUST TURN-TAKING IN FULL-DUPLEX SPOKEN DIALOGUE SYSTEMS |
arXiv/Github/Demo |
| Fun-Audio-Chat |
arXiv/Github/Demo |
| FireRedChat: A Pluggable, Full-Duplex Voice Interaction System with Cascaded and Semi-Cascaded Implementations |
arXiv/Github/Demo |
| PERSONAPLEX: VOICE AND ROLE CONTROL FOR FULL DUPLEX CONVERSATIONAL SPEECH MODELS |
arXiv/Github/Demo |
| VITA: Towards Open-Source Interactive Omni Multimodal LLM |
arXiv/Github/Demo |
| A Full-duplex Speech Dialogue Scheme Based On Large Language Models |
arXiv |
| Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM |
arXiv/Github |
| Moshi: a speech-text foundation model for real-time dialogue |
arXiv/Github |
| FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems |
arXiv |
| MinMo: A Multimodal Large Language Model for Seamless Voice Interaction |
arXiv/Demo |
| SoulX-DuoVoice |
Unofficial Intro |
| SALMONN-omni: A Standalone Speech LLM without Codec Injection for Full-duplex Conversation |
arXiv/Github |
| CleanS2S: Single-file Framework for Proactive Speech-to-Speech Interaction |
arXiv/Github |
| Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities |
arXiv/Github |
| Real-Time Textless Dialogue Generation |
arXiv/Github/Demo |
| Language Model Can Listen While Speaking |
arXiv/Demo |
| Parrot: Seamless Spoken Dialogue Interaction with Double-Channel Large Language Models |
Paper |
| Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents |
arXiv/Demo |
| LLM-Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems |
arXiv |
| Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue Systems |
arXiv |
| Generative Spoken Dialogue Language Modeling |
arXiv/Github/Demo |
| Duplex Conversation in Outbound Agent System |
Paper |
| OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation |
arXiv/Demo |
| DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities |
arXiv/Github |
| Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models |
arXiv/Github/HuggingFace |