Skip to content

feat: add FixedChunked transcriber#72

Open
humblemuzzu wants to merge 1 commit intocjpais:mainfrom
humblemuzzu:fix/fixed-chunked-transcriber
Open

feat: add FixedChunked transcriber#72
humblemuzzu wants to merge 1 commit intocjpais:mainfrom
humblemuzzu:fix/fixed-chunked-transcriber

Conversation

@humblemuzzu
Copy link
Copy Markdown

Reworked from #71 per your feedback — chunking now lives in the Transcriber layer, not inside the model.

What

FixedChunked — a new Transcriber implementation that splits audio into fixed-duration chunks with configurable overlap. Sits alongside VadChunked and EnergyAdaptiveChunked.

No VAD model, no energy analysis. The simplest chunking strategy for models with hard sequence-length limits.

Why overlap

EnergyAdaptiveChunked with search_window=0 gives you fixed-duration chunks, but with hard cuts. FixedChunked keeps a configurable tail from each chunk in the buffer so the next chunk starts with shared audio context. This prevents the model from seeing a hard cut mid-word at chunk boundaries.

Usage

use transcribe_rs::transcriber::{FixedChunked, FixedChunkedConfig, Transcriber};

let config = FixedChunkedConfig::default(); // 30s chunks, 1s overlap
let mut chunker = FixedChunked::new(config, TranscribeOptions::default());
let result = chunker.transcribe(&mut model, &samples)?;

Config

pub struct FixedChunkedConfig {
    pub chunk_duration_secs: f32,  // default 30.0
    pub overlap_secs: f32,         // default 1.0 (0.0 for hard cuts)
    pub padding_secs: f32,         // default 0.0
    pub min_chunk_secs: f32,       // default 0.0
    pub merge_separator: String,   // default " "
}

What changed

  • New file: src/transcriber/fixed_chunked.rs (413 lines)
  • src/transcriber/mod.rs: +3 lines (register, re-export, doc)

Uses existing transcribe_padded() and merge_sequential_with_separator(). No new dependencies. Parakeet model internals untouched.

Tests

12 unit tests using MockModel/FailOnNthModel, same patterns as the existing transcriber tests:

  • Splitting at chunk duration
  • Overlap retains tail correctly
  • Remainder handled in finish()
  • min_chunk_secs skips short remainders
  • Timestamps correct with and without overlap
  • Timestamp clamping with padding
  • Empty input
  • Object safety (Box<dyn Transcriber>)
  • Reusable after error
  • Short audio single pass

Validated

Tested with an ~8 minute recording on parakeet-tdt-0.6b-v3-int8. 17 chunks, 8.9s total, clean output with no garbled words at boundaries.

Adds a new Transcriber implementation that splits audio into
fixed-duration chunks with configurable overlap. No VAD model or
energy analysis needed — the simplest chunking strategy for models
with hard sequence-length limits (e.g. Conformer encoders).

The overlap keeps a small tail from each chunk in the buffer so
the next chunk starts with shared audio context, preventing garbled
words at chunk boundaries.

Uses the existing transcribe_padded() and merge_sequential_with_separator()
infrastructure. Includes 12 unit tests covering splitting, overlap,
timestamps, remainders, error recovery, and object safety.

Defaults: 30s chunks, 1s overlap.
Session-Id: 2695c041-2969-46cd-b749-61636e27d352
@cjpais
Copy link
Copy Markdown
Owner

cjpais commented Mar 27, 2026

@humblemuzzu would you mind sharing the audio file you have? Or uploading it as part of your commit? Just curious

@humblemuzzu
Copy link
Copy Markdown
Author

that was just my long prompt to claude pretty unhinged but happy to share on X, dmed you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants