Overview
The voice-to-text feature fails to transcribe long audio recordings (10–15 minutes or more) reliably. This issue appears to stem from improper handling of long-duration audio — potentially due to missing logic around chunking, buffering, or transcription submission. As a result, transcriptions are often incomplete or entirely missing, which affects users relying on long-form voice inputs.
Affected Endpoint and Files
- Endpoint(s): (Not applicable – client-side feature issue)
- Route file: (N/A)
- Controller: (N/A)
- Middleware (if applicable): (N/A)
- Service/DAO: (N/A)
- Client Components:
nextjs/src/components/Chat/VoiceChat.tsx
nextjs/src/components/Chat/ChatClone.tsx
Evidence
- Voice recordings longer than ~10–15 minutes often fail to return a full transcription.
- Either:
- No transcription is added to the chat, or
- Only the beginning portion of the audio is transcribed, with the rest missing.
- Users report this issue consistently during long recordings.
Steps to Reproduce
- Open the chat interface in the app.
- Start a voice recording session.
- Speak continuously for 10–15 minutes.
- Stop the recording and wait for transcription to complete.
- Check the resulting text in the chat window
Expected Behavior
- Transcription is missing or incomplete.
- Only the first few minutes are transcribed, or the chat receives no text at all.
- Users do not receive feedback or error messages.
Proposed Remediation
- Support audio durations of 10–15+ minutes in both capture and transcription.
- If using audio chunking, ensure all chunks are received, stored, and concatenated correctly.
- Check transcription submission logic for failures with large inputs.
- Add temporary debug logs to identify where data is lost (recording, buffering, upload, transcription).
Overview
The voice-to-text feature fails to transcribe long audio recordings (10–15 minutes or more) reliably. This issue appears to stem from improper handling of long-duration audio — potentially due to missing logic around chunking, buffering, or transcription submission. As a result, transcriptions are often incomplete or entirely missing, which affects users relying on long-form voice inputs.
Affected Endpoint and Files
nextjs/src/components/Chat/VoiceChat.tsxnextjs/src/components/Chat/ChatClone.tsxEvidence
Steps to Reproduce
Expected Behavior
Proposed Remediation