whisper: set no_context to prevent quality drift over a session by anton-averich · Pull Request #79 · cjpais/transcribe-rs

anton-averich · 2026-04-07T16:45:58Z

Problem

Whisper transcription quality degrades progressively over a long push-to-talk session in Handy:

Short clips (one or two words dictated into a different chat) frequently get mis-recognized or returned as empty.
Language detection sticks to the previous language. After dictating in Russian, the next English utterance often comes back transcribed as Russian.
Reloading the model fully restores quality.
Punctuation also tends to drift in the direction of whatever was dictated earlier.

I've been hitting this for a while and was reloading Handy a couple of times a day to clear it.

Cause

whisper.cpp's whisper_full defaults to using prompt_past — the last decoded tokens are fed back as a prompt for the next decode. That is the right thing for continuous speech (lectures, meetings) where consecutive segments are connected. It is the wrong thing for push-to-talk and similar workloads where each call to transcribe is an independent utterance: residue from prior, unrelated decodes biases the next one.

Short clips suffer most because they have less acoustic evidence to overcome the stale prompt.
Language switches suffer because the prompt is in the previous language and steers detection.
Punctuation drift is a side-effect of the same mechanism — the model imitates the style of the (stale) prompt.

Fix

Set FullParams::set_no_context(true) in WhisperEngine::infer so each decode starts from a clean prompt. One line.

The user-supplied initial_prompt is unaffected — it goes through a different FullParams field. I verified this with the existing test_prompt_product_names test, which still passes (it asserts that initial_prompt influences the output, and it does).

Performance

If anything, slightly cheaper — fewer prompt tokens for the decoder to process at the start of each decode. No allocations, no API change.

Compatibility

Public API unchanged. If anyone needs the old behaviour for streaming or continuous-speech use cases, happy to expose no_context as an opt-in field on WhisperInferenceParams in a follow-up.

Testing

cargo test --features whisper-cpp — all 3 whisper tests pass (test_jfk_transcription, test_prompt_product_names, test_timestamps).
Built Handy locally against this patch and used it for a full day of normal push-to-talk dictation. The drift symptoms above no longer reproduce.

Whisper transcription quality degrades progressively over a long push-to-talk session: short clips get mis-recognized or returned empty, and language detection sticks to the previous language (e.g. RU→EN switches keep producing Russian). Reloading the model restores quality. The cause is whisper.cpp's default prompt_past behaviour — the last decoded tokens are fed back as a prompt for the next decode. That's the right thing for continuous speech (lectures, meetings) where consecutive segments are connected, but the wrong thing for push-to-talk and similar workloads where each call to transcribe is an independent utterance: stale prompt tokens bias the next decode. Short clips suffer most because they have less acoustic evidence to overcome the stale prompt; language switches suffer because the prompt is in the previous language and steers detection. Set no_context = true so each decode starts from a clean prompt. The user-supplied initial_prompt continues to work — it goes through a different field and is unaffected.

cjpais · 2026-04-08T00:38:40Z

I think we can definitely add this, but I think we need to add it as an option for someone to change since this is a library. We can have the default be what you suggest

anton-averich · 2026-04-08T08:08:23Z

Makes sense, thanks for the quick reply. I'll add it as a field on WhisperInferenceParams with the default set to true and push to this branch.

Per review, make no_context an opt-in field on WhisperInferenceParams so callers can override it for continuous-speech use cases (lectures, meetings, streaming) where carrying prompt_past across segments improves consistency. Default stays true — the right choice for independent utterances such as push-to-talk dictation, which is the case the previous commit fixed.

anton-averich · 2026-04-08T08:39:57Z

Done! Added no_context: bool to WhisperInferenceParams with true as the default, so the previous commit's behaviour carries over for anyone using ..Default::default(). Continuous-speech callers can now opt out:

WhisperInferenceParams { no_context: false, ..Default::default() }

Thanks again for the quick review! Happy to tweak naming or docs if you'd prefer something different.

cjpais · 2026-04-08T13:23:29Z

Thank you!

fmt

52f6e85

cjpais merged commit d97ae65 into cjpais:main Apr 8, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper: set no_context to prevent quality drift over a session#79

whisper: set no_context to prevent quality drift over a session#79
cjpais merged 3 commits intocjpais:mainfrom
anton-averich:fix/whisper-no-context

anton-averich commented Apr 7, 2026

Uh oh!

cjpais commented Apr 8, 2026

Uh oh!

anton-averich commented Apr 8, 2026

Uh oh!

anton-averich commented Apr 8, 2026

Uh oh!

Uh oh!

cjpais commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anton-averich commented Apr 7, 2026

Problem

Cause

Fix

Performance

Compatibility

Testing

Uh oh!

cjpais commented Apr 8, 2026

Uh oh!

anton-averich commented Apr 8, 2026

Uh oh!

anton-averich commented Apr 8, 2026

Uh oh!

Uh oh!

cjpais commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants