Skip to content

Commit d97ae65

Browse files
whisper: set no_context to prevent quality drift over a session (#79)
* whisper: set no_context to prevent quality drift over a session Whisper transcription quality degrades progressively over a long push-to-talk session: short clips get mis-recognized or returned empty, and language detection sticks to the previous language (e.g. RU→EN switches keep producing Russian). Reloading the model restores quality. The cause is whisper.cpp's default prompt_past behaviour — the last decoded tokens are fed back as a prompt for the next decode. That's the right thing for continuous speech (lectures, meetings) where consecutive segments are connected, but the wrong thing for push-to-talk and similar workloads where each call to transcribe is an independent utterance: stale prompt tokens bias the next decode. Short clips suffer most because they have less acoustic evidence to overcome the stale prompt; language switches suffer because the prompt is in the previous language and steers detection. Set no_context = true so each decode starts from a clean prompt. The user-supplied initial_prompt continues to work — it goes through a different field and is unaffected. * whisper: expose no_context as a configurable field Per review, make no_context an opt-in field on WhisperInferenceParams so callers can override it for continuous-speech use cases (lectures, meetings, streaming) where carrying prompt_past across segments improves consistency. Default stays true — the right choice for independent utterances such as push-to-talk dictation, which is the case the previous commit fixed. * fmt --------- Co-authored-by: CJ Pais <cj@cjpais.com>
1 parent 343768c commit d97ae65

2 files changed

Lines changed: 8 additions & 2 deletions

File tree

src/onnx/session.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ use ort::ep::ROCm;
88
use ort::ep::TensorRT;
99
#[cfg(feature = "ort-webgpu")]
1010
use ort::ep::WebGPU;
11-
#[cfg(feature = "ort-xnnpack")]
12-
use ort::ep::XNNPACK;
1311
use ort::ep::CPU;
1412
#[cfg(feature = "ort-cuda")]
1513
use ort::ep::CUDA;
14+
#[cfg(feature = "ort-xnnpack")]
15+
use ort::ep::XNNPACK;
1616

1717
use ort::session::builder::GraphOptimizationLevel;
1818
use ort::session::Session;

src/whisper_cpp/mod.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,10 @@ pub struct WhisperInferenceParams {
110110

111111
/// Initial prompt to provide context to the model.
112112
pub initial_prompt: Option<String>,
113+
114+
/// Start each decode with a clean prompt (whisper.cpp's `prompt_past`).
115+
/// Default `true` suits push-to-talk; set `false` for continuous speech.
116+
pub no_context: bool,
113117
}
114118

115119
impl Default for WhisperInferenceParams {
@@ -126,6 +130,7 @@ impl Default for WhisperInferenceParams {
126130
no_speech_thold: 0.2,
127131
n_threads: 0,
128132
initial_prompt: None,
133+
no_context: true,
129134
}
130135
}
131136
}
@@ -220,6 +225,7 @@ impl WhisperEngine {
220225
full_params.set_suppress_blank(params.suppress_blank);
221226
full_params.set_suppress_nst(params.suppress_non_speech_tokens);
222227
full_params.set_no_speech_thold(params.no_speech_thold);
228+
full_params.set_no_context(params.no_context);
223229
if params.n_threads > 0 {
224230
full_params.set_n_threads(params.n_threads);
225231
}

0 commit comments

Comments
 (0)