Use of transcribe feature for audio - speaker detection? #1455
Replies: 2 comments
-
|
Hey @Stryfe-Delivery , Short answer: Not currently supported in markitdown. Markitdown's audio pipeline uses speech_recognition for basic speech-to-text transcription only. It outputs a flat transcript with no speaker labels — there's no diarization layer in the pipeline. Workaround for speaker detection: Use a dedicated diarization tool before or after markitdown. Best options: Once you have a diarized transcript from one of these, you can feed the resulting text into markitdown or directly to your LLM. This is a feature gap worth opening as a feature request on the markitdown repo — the speech_recognition backend would need to be replaced or augmented with a diarization-capable ASR pipeline. 👍 If this helped you, please mark it as the answer — it helps others in the community who run into the same issue find the solution faster! |
Beta Was this translation helpful? Give feedback.
-
|
I don’t think so. From the code, MarkItDown does basic audio transcription, but I’m not seeing speaker diarization / speaker labels. So you can get a transcript, but not automatic So at the moment: transcription yes, speaker detection no. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
In some tools (simple example is One Note) you can transcribe audio and it will flag each dialog line as [speaker 1] or [speaker 2] etc. based on the speakers inflection and some other metadata. you can then find/replace with the speaker names and have a full transcript of a meeting.
Is there any way to do that with this tool and I've missed it? or is that a capability not yet available? The audio samples I tried came back transcribed but with no indicators that the speaker had changed during the discussion.
Beta Was this translation helpful? Give feedback.
All reactions