add nemotron streaming by andrewleech · Pull Request #36 · cjpais/transcribe-rs

andrewleech · 2026-02-16T23:18:14Z

Summary

Adds streaming transcription via parakeet-rs's Nemotron engine behind a nemotron-streaming feature flag. The StreamingTranscriptionEngine trait provides push_samples, get_transcript, and reset — designed for showing partial transcription during recording in Whispering/epicenter. Happy to adjust the API if a different shape would be preferred.

Audio resampling utilities (mix_to_mono, create_resampler, resample_chunk) are included behind a resampling feature that nemotron-streaming depends on. These were previously duplicated in the downstream consumer.

The ort/ndarray bump (rc.10→rc.11, ndarray 0.16→0.17) is in the first commit. This replicates #27 on current master with the broader testing across all ONNX engines that was requested on that PR. The upgrade required API migrations across all ONNX engines — sense_voice had two additional issues: input.name became private, and metadata().custom(key) returns empty string for absent keys instead of None.

Full disclosure; I'm not very experienced with Rust, my focus is more in the embedded / python space. This PR was prepared with a lot of involvement from claude code, I hope this is acceptible. My appologies if the design / style is at-odds with the project, I've made efforts to integrate this cleanly.

Relates to #31, #27.

Re: streaming API spec (#4)

I saw @Leftium's streaming API proposal in #4 after implementing this. The current API is simpler — push_samples merges accept/decode/get_result into one call returning a String, whereas the spec proposes a 4-method pull-based loop with a structured Transcript return type carrying is_final, is_endpoint, timing, confidence, etc.

The main gaps vs that spec are: no input_finished() flush, no is_endpoint() for speech boundary detection, and no structured result type. The spec's design makes more sense once there are multiple streaming backends with different chunking requirements — with only Nemotron so far, the simpler API was sufficient for the Whispering use case. Happy to refactor toward that spec if desired.

Testing

Tested on Linux (WSL2, Zen 4 / Ryzen AI 9 365) with downloaded models.

ort rc.11 migration — existing engines:

parakeet: pass (2 tests)
moonshine: pass
sense_voice: pass (when using int8 model — the existing test silently skips due to a pre-existing path mismatch, see note below)

nemotron-streaming (5 tests):

streaming transcription of JFK audio in 560ms chunks
reset clears accumulated transcript
concatenated incremental returns match get_transcript()
model-free tests: empty transcript on fresh engine, error on push without model

audio utilities (13 unit tests): mix_to_mono, create_resampler, resample_chunk. No models required.

whisper: not tested — whisper-medium on llvmpipe software Vulkan ran 1h+ without completing a single transcription. Needs real GPU hardware. No code changes to whisper engine in this branch.

openai: API call succeeds but the existing exact-string assertion fails — OpenAI's model now returns slightly different punctuation. Pre-existing issue.

Note: pre-existing issues found during testing

These predate this branch, can fix in a follow-up:

Moonshine README URLs: HuggingFace restructured the repo — files moved from .../onnx/merged/{variant}/ to .../onnx/merged/{variant}/float/
SenseVoice test: expects FP32 model at models/sense-voice but only int8 is available as a packaged download, so the test silently skips

cjpais · 2026-02-17T01:18:07Z

Thank you for the PR.

Bumping to rc.11 will kill Intel MacOS builds in Handy, right now I'm hesitant to pull in this version bump. I need some time to think about this, I may end up doing a final release for Intel Macs and dropping new feature support for them.

Also for now, it would be best to drop the streaming interface side of changes. I want to continue to evaluate the best streaming interface. But it would still be great to have offline support regardless. I'll probably use what you've done also as a reference.

andrewleech · 2026-02-17T03:21:39Z

@cjpais thanks for the feedback - I didn't realise ort was a breaking change for older macs!

I can split this into two ort / streaming pr's if that helps, I've implemented the local-specific streaming interface for my own use but thought I'd share it in case it helps - no pressure to merge though certainly. The ort upgrade was just to support the parakeet-rs library.

For some reason I thought Whispering/epicenter was a closer fit to my workflow but after using it for a couple of days I don't think that's true, Handy might be just as close. I'll take another look at that and whether what I want fits there better!

cjpais · 2026-02-17T03:27:49Z

No worries! Thank you! Yeah it would be amazing to have the streaming PR separately. I totally understand it's need :) I want it too, just need to take my time with it and try to review and reason a bit on my own. I'm trying to do some major refactoring and want to make sure all the interface boundaries are clear and solid. Plus all the other issues and features I'm trying to support in Handy as well

If only ort rc.11 works for Nemotron Streaming, we might just have to break Intel support on MacOS (or at least that's what someone mentioned in cjpais/Handy#436. I know the newer ort does solve some issues on other machines. I will need to evaluate this, as it might break downstream projects as well (like whispering)

Leftium · 2026-02-17T11:33:55Z

Re: streaming API spec (#4)

I saw @Leftium's streaming API proposal in #4 after implementing this. The current API is simpler — push_samples merges accept/decode/get_result into one call returning a String, whereas the spec proposes a 4-method pull-based loop with a structured Transcript return type carrying is_final, is_endpoint, timing, confidence, etc.

The main gaps vs that spec are: no input_finished() flush, no is_endpoint() for speech boundary detection, and no structured result type. The spec's design makes more sense once there are multiple streaming backends with different chunking requirements — with only Nemotron so far, the simpler API was sufficient for the Whispering use case. Happy to refactor toward that spec if desired.

While simplifying and breaking up my API proposal some details were lost (in unwritten sub-specs.)

Probably the most important part of my spec is the structured Transcript return type. If only one part of the spec is implemented, it should be this.

I actually specified two different API's: one for engine implementors and one for transcribe-rs users

Low-level pull-based StreamingTranscriptionEngine Interface
- Most engine implementors should implement to this interface
- Labeled (B) in current spec diagram
- The implementations should be mostly thin wrappers of the underlying API's (Nemotron, Vosk, etc)
High Level: callback-Based StreamingTranscriptionSource
- Most transcribe-rs users should use this interface
- Labeled (C) in the current spec diagram
- The (Nemotron) StreamingTranscriptionSource is automatically available "for free" if the (Nemotron) StreamingTranscriptionEngine was implemented.

Results:

minimal effort for both new engine implementors and transcribe-rs consumers
unified streaming API: a single API for multiple engines

The lost details still live in previous versions of the spec; these simply need to be moved into the sub-specs:

Originally I specified the types/interfaces in Rust. However I converted to pseudocode in case details like 'static were over-specified. (I am not familiar with Rust).

The version with Rust definitions, for reference

push_samples now returns Vec<StreamingSegment> instead of String. Each segment carries an is_endpoint flag indicating whether the text ends at a sentence boundary (. ? !) detected from the model's punctuated output. Includes split_at_sentence_boundaries helper with unit tests.

pi-anl added 4 commits March 2, 2026 19:24

bump ort to rc.11

339fdaa

add nemotron streaming

080e7c7

add nemotron streaming tests

3d309c9

andrewleech force-pushed the feat/nemotron-streaming branch from a8aebcd to 03d5782 Compare March 2, 2026 11:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add nemotron streaming#36

add nemotron streaming#36
andrewleech wants to merge 4 commits intocjpais:mainfrom
andrewleech:feat/nemotron-streaming

andrewleech commented Feb 16, 2026 •

edited

Loading

Uh oh!

cjpais commented Feb 17, 2026 •

edited

Loading

Uh oh!

andrewleech commented Feb 17, 2026 •

edited

Loading

Uh oh!

cjpais commented Feb 17, 2026

Uh oh!

Leftium commented Feb 17, 2026 •

edited

Loading

Re: streaming API spec (#4)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

andrewleech commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Re: streaming API spec (#4)

Testing

Note: pre-existing issues found during testing

Uh oh!

cjpais commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewleech commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cjpais commented Feb 17, 2026

Uh oh!

Leftium commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re: streaming API spec (#4)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andrewleech commented Feb 16, 2026 •

edited

Loading

cjpais commented Feb 17, 2026 •

edited

Loading

andrewleech commented Feb 17, 2026 •

edited

Loading

Leftium commented Feb 17, 2026 •

edited

Loading