We have built in the interruption, but the stop is based on the feedback loop from ASR. For example, when the system detects the ASR, the TTS should stop. However, the latency for this design is roughly 2 - 3 seconds. We can explore more about VAD to improve the stability.
We have built in the interruption, but the stop is based on the feedback loop from ASR. For example, when the system detects the ASR, the TTS should stop. However, the latency for this design is roughly 2 - 3 seconds. We can explore more about VAD to improve the stability.