Open
Conversation
Adds a new Transcriber implementation that splits audio into fixed-duration chunks with configurable overlap. No VAD model or energy analysis needed — the simplest chunking strategy for models with hard sequence-length limits (e.g. Conformer encoders). The overlap keeps a small tail from each chunk in the buffer so the next chunk starts with shared audio context, preventing garbled words at chunk boundaries. Uses the existing transcribe_padded() and merge_sequential_with_separator() infrastructure. Includes 12 unit tests covering splitting, overlap, timestamps, remainders, error recovery, and object safety. Defaults: 30s chunks, 1s overlap. Session-Id: 2695c041-2969-46cd-b749-61636e27d352
Owner
|
@humblemuzzu would you mind sharing the audio file you have? Or uploading it as part of your commit? Just curious |
Author
|
that was just my long prompt to claude pretty unhinged but happy to share on X, dmed you |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reworked from #71 per your feedback — chunking now lives in the Transcriber layer, not inside the model.
What
FixedChunked— a newTranscriberimplementation that splits audio into fixed-duration chunks with configurable overlap. Sits alongsideVadChunkedandEnergyAdaptiveChunked.No VAD model, no energy analysis. The simplest chunking strategy for models with hard sequence-length limits.
Why overlap
EnergyAdaptiveChunkedwithsearch_window=0gives you fixed-duration chunks, but with hard cuts.FixedChunkedkeeps a configurable tail from each chunk in the buffer so the next chunk starts with shared audio context. This prevents the model from seeing a hard cut mid-word at chunk boundaries.Usage
Config
What changed
src/transcriber/fixed_chunked.rs(413 lines)src/transcriber/mod.rs: +3 lines (register, re-export, doc)Uses existing
transcribe_padded()andmerge_sequential_with_separator(). No new dependencies. Parakeet model internals untouched.Tests
12 unit tests using MockModel/FailOnNthModel, same patterns as the existing transcriber tests:
Validated
Tested with an ~8 minute recording on parakeet-tdt-0.6b-v3-int8. 17 chunks, 8.9s total, clean output with no garbled words at boundaries.