feat: add Qwen3-ASR batch transcription engine by andrewleech · Pull Request #48 · cjpais/transcribe-rs

andrewleech · 2026-03-04T10:08:14Z

Summary

Adds engines/qwen3 module implementing TranscriptionEngine for Qwen3-ASR, Alibaba's multilingual speech recognition model. Supports 0.6B and 1.7B model variants.

New qwen3 Cargo feature — follows the same pattern as existing ONNX-based engines (parakeet, moonshine, sense_voice): feature-gated, uses ort + ndarray, CPU execution.

Engine details

Encoder-decoder architecture with autoregressive token generation
Log-mel spectrogram feature extraction (80-bin, via rustfft)
HuggingFace BPE tokenizer
Language prefix stripping — the model outputs language <Name><text>, matched against known language names to find the boundary
Supports both FP32 and INT8 quantized models (selected via Qwen3ModelParams)

Pre-exported ONNX models

andrewleech/qwen3-asr-0.6b-onnx (FP32)
andrewleech/qwen3-asr-0.6b-onnx-int8 (INT8)
andrewleech/qwen3-asr-1.7b-onnx (FP32)

Export scripts and methodology: andrewleech/qwen3-asr-onnx

Resolves #30

andrewleech · 2026-03-04T10:19:25Z

Oh, only just saw #46 beat me by a day - I see the note about the refactor and happy to rebuild this afterwards if it looks useful!

This is also very much AI driven code, I'm more of an embedded C / micropython / python developer professionally and not yet fluent in rust.

If you're interested I've got some other changes "ready" to push up to enable various ort GPU integrations though here into Handy, though on this model they ended up being slower than CPU on my AMD integrated gpu machine.

Points at andrewleech/transcribe-rs feat/qwen3-batch (PR cjpais/transcribe-rs#48). Drop this commit once qwen3 support is published to crates.io.

cjpais · 2026-03-04T12:19:55Z

Thank you for the contribution and largely would prefer onnx implementations, I will try to test in the coming days and pull it in.

I also do want to bring in acceleration support, and it will come here first before Handy. Probably that will come after the refactor so we have a cleaner base to work from. This PR probably will have to wait for the refactor as well, but I would guess it will be an easy port

cjpais · 2026-03-07T13:38:09Z

@andrewleech if you dont mind refactoring this into the new structure that would be great. Maybe parts of it can be simplified. Let me know

andrewleech · 2026-03-10T05:56:26Z

Thanks for the review feedback on the initial draft. Here's a summary of what changed between the original submission and this revision:

Rebased onto current main (post-PR #51 "Reorganize Library into Engines more clearly"). The engine now implements the SpeechModel trait (capabilities() + transcribe(&mut self, &[f32], &TranscribeOptions)) and uses the shared src/onnx/session.rs helpers (create_session, resolve_model_path, Quantization) and TranscribeError throughout. The old TranscriptionEngine associated-type API and Qwen3Error type are gone.

Why mel.rs is not shared with src/features/mel.rs: Qwen3-ASR requires a Slaney-normalized mel filterbank (matching Whisper's feature extractor) computed in f64. The shared mel pipeline uses HTK normalization in f32. These are incompatible at the numeric level. A MelScale enum in the shared module would be the right long-term fix; that's noted in a TODO comment in qwen3/mel.rs.

Integration tests added (tests/qwen3.rs):

test_qwen3_transcribe — 0.6B model on jfk.wav, asserts exact transcript
test_qwen3_1_7b_transcribe — same for 1.7B model
test_qwen3_max_tokens_truncation — verifies transcribe_with(&Qwen3Params { max_tokens: 5 }) produces a non-empty result shorter than the full transcript

All three skip gracefully when model files are not present, so CI passes without the ~1 GB model artifacts.

Other changes since the initial draft:

MelConfig in config.rs renamed to Qwen3MelParams to avoid a name collision with crate::features::MelConfig
greedy_decode made private (only called from transcribe in the same impl)
encode: avoids a redundant clone (mel.view().into_dyn()) and uses into_dimensionality::<Ix3>() instead of manual from_shape_vec
SpecialTokens non-negative validation added at load time
log::warn! on the right-side reflect-padding fallback path in mel.rs
strip_language_prefix warns before returning empty string on unrecognised-language-no-newline case

cjpais · 2026-03-10T06:21:22Z

@andrewleech can you check your onnx export? I suspect something is not right. The raw FP32 safetensors for Qwen 0.6B is 1.88GB. I had to download a file much larger than that, which uncompresses into 6GB. I think there are probably a lot of duplicated tensors in your export. I think it would make sense to not split .onnx and .data where possible too. I would look at some other ONNX exports of other models (either by the sherpa team or istupakov) for more canonical formatting

I am not impressed by the speed of the inference as well, though the transcription quality is good. I bet this performance can be improved. It is 10x slower than parakeet of the same size which is quite surprising.

andrewleech · 2026-03-10T09:40:46Z

Follow-up to the previous comment addressing the export size concern.

Root cause: The original export produced two separate ONNX files (decoder_init.onnx + decoder_step.onnx) backed by separate .data weight files. Both wrappers held references to the same PyTorch decoder parameters, but ONNX wrote each wrapper's weights independently — resulting in full duplication (~2.38 GB × 2 = 4.76 GB for decoders alone, 5.8 GB total for the 0.6B model).

Fix: Added a unified DecoderWrapper to the export script that handles both prefill (past_seq=0) and decode steps in a single ONNX graph. The attention mask is constructed as cat([zeros(q_len, past_seq), causal_triu(q_len, q_len)], dim=1) — when past_seq=0 this reduces to just the causal block, and the whole expression traces cleanly through torch.export.

Result:

FP32: 5.8 GB → ~3.1 GB (encoder 717 MB + decoder 2.38 GB + embeddings + tokenizer)
INT8: 4.2 GB → ~1.6 GB (encoder 734 MB + decoder 569 MB + fp16 embeddings)

The Rust library now auto-detects format at load time: tries decoder.onnx first, falls back to the legacy decoder_init.onnx + decoder_step.onnx split for backward compatibility. All 3 integration tests pass (0.6B unified, 1.7B split, max-tokens truncation), and compare.py confirms exact token agreement between unified FP32, quantized INT8, and native PyTorch inference.

The export tooling is at https://github.com/andrewleech/qwen3-asr-onnx — the --split-decoder flag preserves the old format if needed.

xkcoding · 2026-03-11T03:01:10Z

Nice work @andrewleech! 👍 I closed my PR #46 (qwen-asr crate based) in favor of this — the ONNX approach with ort fits the project much better, and the unified decoder fix cutting the model size in half is impressive.

Great to see you tackled the ONNX export quality issues too. Looking forward to seeing this merged!

cjpais · 2026-03-11T05:33:13Z

The Rust library now auto-detects format at load time: tries decoder.onnx first, falls back to the legacy decoder_init.onnx + decoder_step.onnx split for backward compatibility. All 3 integration tests pass (0.6B unified, 1.7B split, max-tokens truncation), and compare.py confirms exact token agreement between unified FP32, quantized INT8, and native PyTorch inference.

I don't think we need to support legacy stuff. It adds bloat and this is a fresh PR.

Can you please upload the files to HF?

I'm a bit skeptical right now of pulling this in to be honest, this has been a bit sloppy so far. Was there any sanity checking done?

cjpais · 2026-03-17T06:13:17Z

Mind giving the models you have for this?

andrewleech · 2026-03-18T12:05:49Z

Update: branch rewritten onto v0.3.2

The branch has been rebased onto upstream/main (v0.3.2) and the commit history rebuilt as two clean commits.

What changed since the initial push

Adapted to upstream v0.3 API:

src/engines/qwen3/ → src/onnx/qwen3/ (matches the refactor in Reorganize Library into Engines more clearly #51)
TranscriptionEngine → SpeechModel trait
Uses Quantization enum and session::resolve_model_path for model loading

Model format changes:

Hybrid decoder: decoder_init accepts input_ids + audio_features (embedding table in graph for prefill scatter); decoder_step accepts input_embeds (Rust-side lookup from embed_tokens.bin)
Split decoder preferred over unified (decoder_init.onnx + decoder_step.onnx)
INT8/INT4 decoder variants auto-detected via suffixed filenames (e.g. decoder_init.int4.onnx)

Performance work (model.rs 550 → 417 lines):

Sequential ORT execution mode + CPU arena allocator for decoder sessions (create_decoder_session)
Zero-copy KV cache via DynValue pass-through (eliminates per-step clone)
Vectorized argmax over contiguous logit slice

New:

Quantization::Int4 variant for MatMulNBits models (wired through all engines)
bench_compare example with --help, accelerator selection, quantization flags
Integration tests for 0.6B, 1.7B, and 1.7B-int4

Model file structure

The model directory layout follows the same encoder/decoder split pattern used by Moonshine and Canary in this repo:

encoder.onnx                    # FP32 (all variants use FP32 encoder)
decoder_init.onnx               # prefill: audio features + token IDs → first KV cache
decoder_step.onnx               # per-token: input embeds + KV cache → next token
embed_tokens.bin                 # FP32 embedding table [vocab_size, hidden_size]
config.json                     # model dimensions, special token IDs, quantization metadata
vocab.json                      # SentencePiece vocabulary

Quantized decoder variants use suffixed filenames (e.g. decoder_init.int4.onnx) alongside the FP32 originals.

The main departure from the other engines is embed_tokens.bin. Qwen3-ASR ties its embedding and lm_head weights (standard for decoder-only transformers). decoder_init needs the embedding table in-graph for the prefill scatter, but duplicating it into decoder_step would add 594 MB of redundant weights. Extracting it as a flat binary lets Rust do a single load and fast row lookups during autoregressive decoding. We tested the alternative (keeping the table in decoder_step as a shared ONNX initializer with lm_head) but the required transpose on every token step made inference 2.5× slower.

Model refinement

The ONNX export pipeline and quantization approach went through ~90 experiments covering AWQ smoothing, GPTQ calibration, int4/int8 MatMul-only quantization, accuracy_level tuning, and encoder quantization impact. Full experiment log: https://github.com/andrewleech/qwen3-asr-onnx/blob/main/INVESTIGATION.md

Key findings:

accuracy_level=4 on int4 MatMulNBits improves both speed and WER
INT8 encoder degrades WER by ~1pp — all variants use the FP32 encoder
1.7B benefits from GPTQ on decoder_init + RTN on decoder_step (GPTQ on step is too slow to calibrate for minimal gain)
AWQ INT8 is not recommended for 1.7B — causes degraded special token prediction (9% WER)

Recommended model variants

Two variants are published per model size — FP32 (baseline/GPU target) and int4 (recommended for CPU):

Model	Quantization	WER	RTF	Size
Qwen3 0.6B	int4 (RTN al4)	5.08%	0.16x	~2.6 GB
Qwen3 0.6B	FP32	4.42%	0.40x	3.8 GB
Qwen3 1.7B	int4 (GPTQ-init + RTN al4)	4.25%	0.37x	~5.6 GB
Qwen3 1.7B	FP32	3.79%	—	8.8 GB
Parakeet 0.6B	INT8 (reference)	5.45%	0.16x	—

200-sample LibriSpeech test-other, CPU inference, WSL2/Linux, ORT 2.0.0-rc.12. RTF measured on 11s JFK clip. RTF < 1 = faster than real-time. Qwen3 produces full punctuation; Parakeet produces minimal.

The 0.6B int4 variant matches Parakeet speed with lower WER and full punctuation output.

Model downloads

Models listed above are currently being uploaded to Hugging Face:

andrewleech/qwen3-asr-0.6b-onnx — 0.6B FP32 + int4
andrewleech/qwen3-asr-1.7b-onnx — 1.7B FP32 + int4

Export pipeline and quantization tools: andrewleech/qwen3-asr-onnx

cjpais · 2026-03-18T12:08:06Z

Thank you for doing a bunch of deep work on this. I will take a look at it soon.

I am a little confused at the int4 download size though. It's larger than the original .safetensors?

andrewleech · 2026-03-18T12:16:57Z

Cheers, I've been using it for a couple of days now as I clean up the repos. Handy integration incoming.

In Handly (on windows) I prefer to use PTT, Paste: Direct, Don't modify clipboard, auto-submit Super+Enter - On a side note I personally think these should be the default, though accidentally holding ctrl down while it typed and mucked up the text threw me a few times.

It doesn't feel quite as fast as Parakeet still, but I do feel like the accuracy, particularly on quiet / whispered audio is working much better.

I am a little confused at the int4 download size though. It's larger than the original .safetensors?

The original safetensors are BF16 (~1.2 GB) which I was unable to convert into an efficient 16bit decoder format, fp32 being the more native onnx format doubled the size initially.

The decoder had the biggest size impact dropping to int4 with less WER impact. I had some success with FP16 encoder but it increased WER from ~5.08 to ~5.18 for a size saving from 2.5GB down to 2.1GB and no change in transcription speed (after the slightly slower model load speed due to size) - I opted to keep the slightly lower WER with a 400MB cost, but open to changing this.

So yeah the ONNX int4 package is larger because the encoder and embedding table are kept at FP32, I made a note of this, but it's probably a bit lost in the text.

cjpais · 2026-03-18T12:24:01Z

sounds good, was mostly just curious, overall it's fine, but 2gb is pretty significant memory impact

andrewleech · 2026-03-18T12:27:17Z

Yeah it's ended up being quite a lot bigger model than Parakeet, and by the numbers I'm not sure it's actually that much better to justify its size. However for me any improvement in low volume voice and improved punctuation is making me happy and it's been an interesting learning exercise!

cjpais · 2026-03-18T12:28:30Z

sweet! im quite curious to try it, I often speak quite softly, going to pull it down and see how it runs in transcribe-rs

andrewleech · 2026-03-18T12:30:24Z

The latest models haven't finished uploading yet, and I'm still cleaning up the current Handy integration branch.
est 2-3 hours more to upload them all

andrewleech · 2026-03-18T13:15:07Z

Handy branch is updated in cjpais/Handy#957 though sorry I haven't built and re-run the latest push, though it was only minor cleanups since the copy I'm running - just need to turn in for the night I'll test more tomorrow.

cjpais · 2026-03-18T13:21:24Z

No worries, I won't get to test until tomorrow either. Just wanna confirm that whatever I download will just work out of the box. Or you'll let me know what files to download for the models.

andrewleech · 2026-03-23T08:55:10Z

@cjpais the two PR's for this should be in a good state for testing - I had some performance issues a few days ago that I surprisingly found came from how many cpu cores ORT was allowed to use on my machine; I started adding a feature to set/adjust this here before finally splitting it off into its own clean pair of braches/PR's.

The models are all up to date on HF as per the url's configured in the Handy PR.

cjpais · 2026-03-23T09:00:58Z

thanks @andrewleech I will take a closer look soon, it may take me a week or so at this point. I do have some other things I need to focus on for a bit

in regards to the cpu thread count, overall it makes sense. I think it will definitely be an option for transcribe-rs, but I am a bit hesitant to add to handy. I will think more on it though

cjpais · 2026-03-28T11:04:40Z

@andrewleech I took another look and downloaded all the files again, I am seeing the duplicate weight thing again. Can we improve this export? It would drop gigabytes from the load time which would be quite significant and right now is blocking me from shipping this. Otherwise it looks good to go

andrewleech · 2026-03-29T16:02:51Z

I am seeing the duplicate weight thing again.

Thanks for the pointers — I'd attempted to fix that previously but then in efforts to more closely match the packaging from other onnx export teams I'd squashed it out again.

Regarding the two decoder-init and decoder-step weights, the int4 copies were being quantized independently into slightly different values because 1.7B was using GPTQ for init and RTN for step. Reassessed the methods — ran 200-sample WER evals comparing GPTQ+RTN vs RTN-only across three independent tests. The difference is small (+0.04pp in the latest, within run-to-run noise) and RTN-only enables weight sharing, so the tradeoff is worth it. Switched 1.7B to RTN-only which makes the transformer layer weights byte-identical between the two decoders.

With that, the common weights are now pulled out into a shared decoder_weights.int4.data file that both decoder protos reference. The only unshared part is the lm_head (output projection) which exists in different forms between init and step — that gets inlined into the step proto (~87-171 MB depending on model size).

Also converted embed_tokens.bin to FP16 storage (cast to FP32 at lookup time) — zero WER impact, halves the file.

I investigated storing the encoder as FP16 too using a native autocast export approach (no Cast node overhead in theory). WER was fine but benchmarking on native Windows showed 9-13% per-inference slowdown from the FP16/FP32 boundary Cast nodes that ORT can't fuse across. So encoder stays FP32 for now. A possible future optimisation would be storing the encoder weights as FP16 on disk and expanding to FP32 in the Rust loader before creating the ORT session — that would save 359 MB (0.6B) / 608 MB (1.7B) on disk with no runtime overhead, but needs a custom loader rather than ORT's built-in file loading.

Also added ruff + mypy to the export repo — no bugs found but cleaned up ~100 lint issues.

Net result:

	Previous	New	Change
0.6B int4 tar.gz	1.57 GB	1.26 GB	-20%
1.7B int4 tar.gz	3.55 GB	2.67 GB	-25%
0.6B RTF (Windows)	0.17x	0.17x	unchanged
1.7B RTF (Windows)	0.37x	0.29x	22% faster
WER	5.16% / 4.25%	5.16% / 4.20%	-0.05pp 1.7B

1.7B speed improvement is from dropping GPTQ — RTN-only loads and runs faster.

wangwillian0 · 2026-04-04T14:26:30Z

Hi, nice PR!

@pi-anl About the recently added commit about language hint, I think there is a specific template which the official qwen3-asr code follows: https://github.com/QwenLM/Qwen3-ASR/blob/main/qwen_asr/core/vllm_backend/qwen3_asr.py#L981-L990

<|im_start|>user\n{audio_placeholder}<|im_end|>
<|im_start|>assistant\nlanguage {full_lang_name_to}<asr_text>

andrewleech · 2026-04-06T22:00:32Z

I think there is a specific template which the official qwen3-asr code

Thanks for that, I've reworked mine to match that!

For background I'd recently discovered an issue in this branch where for certain recordings of various length it would return "ology."

It was related to noisy / very low volume audio, particularly at the start, and the output was also missing the detected language tag - it's clearly just the output when the transcription failed in certain ways.
I found that feeding in the expected language removed the issue, and also protect against it by filtering out any responses that don't have the language tag at the start (all valid recordings do).

ONNX-based Qwen3-ASR speech recognition with split encoder/decoder architecture, Whisper-compatible mel spectrogram, SentencePiece tokenizer, and configurable quantization (FP32/FP16/INT8). Includes decoder session infrastructure (create_decoder_session) with sequential execution mode, CPU arena allocator, and configurable intra-op threads for autoregressive token generation. Performance: INT8 auto-detection, zero-copy KV cache via DynValue pass-through, vectorized argmax, hybrid decoder with Rust-side embed lookup. FP16 embed_tokens.bin support (dtype from config.json).

Add Quantization::Int4 to the enum and wire it through model path resolution, bench_compare parsing, and moonshine streaming. Add integration tests for 1.7B int4 and 0.6B int4 with FP16 embed.

- Use div_ceil() instead of manual ceiling division (mel.rs) - Remove needless borrow (canary/decoder.rs) - Use range contains() instead of manual comparison (moonshine/model.rs) - Derive Default instead of manual impl (moonshine/mod.rs)

The int4 quantized decoder can produce degenerate output (e.g. "ology.") on non-speech audio where quantization noise flips the argmax at the first token. These outputs lack the <asr_text> separator token that normally separates the language prefix from the transcription. Check for the presence of asr_text_token_id (151704) in the generated tokens. If absent, return empty string instead of passing garbage through to the consumer. Logs a warning with the first 20 token IDs for diagnostic purposes. Adds asr_text_token_id to SpecialTokens config struct with serde default for backward compatibility with existing config.json files. style: fix cargo fmt formatting in session.rs

When TranscribeOptions.language is set (e.g. "English"), the decoder prompt includes "Please transcribe the above {language} audio." which conditions the decoder toward the specified language. This eliminates the "ology" degenerate output on non-speech audio (see OLOGY_BUG.md) and aligns int4 output with FP32 behavior. Language token IDs are encoded on first use via greedy longest-match on the BPE vocabulary and cached in RAM for reuse. Changes: - tokenizer.rs: Add encode() with reverse vocabulary lookup - prompt.rs: Add build_prompt_ids_with_language() with template tokens - model.rs: Thread language_token_ids through greedy_decode - engine.rs: Cache language tokens, pass options.language through instead of warning. Qwen3Params gains a language field. fix: address review findings for language hint implementation - Eliminate clone on cache hit: ensure_language_cached() + borrow from cache instead of returning owned Vec - Unify prompt builders: build_prompt_ids delegates to build_prompt_ids_with_language(_, _, None), single code path - Add BCP-47 normalization: "en" → "English" so TranscribeOptions language codes work correctly (14 common codes) - Trim normalize_language_name to common codes only simplify: remove normalize_language_name, pass language string directly The model tokenizes whatever language string is given and includes it in the prompt. No need to map BCP-47 codes to full names — the model handles both "en" and "English" in the prompt context. fix: Qwen3Params::default() max_tokens 0 bug, empty language guard - Implement Default manually with max_tokens=512 instead of derive (derive produced max_tokens=0 which silently truncated output) - Filter empty language strings to None to avoid malformed prompt - Document that language accepts both full names and short codes fix: address branch review findings (5 warnings, 5 infos) - asr_text guard: only apply when EOS was seen, not on max_tokens truncation (fixes conflict with truncation test) - Add asr_text_token_id >= 0 to load-time validation - Mark tokenizer encode() as pub(crate) to prevent misuse on long text - Use ..Default::default() in transcribe_raw instead of hardcoded 512 - Fix dangling OLOGY_BUG.md doc reference - Fix cfg(test) function doc reference - Add unit tests for language-conditioned prompt structure and None-path equivalence with standard prompt refactor: use official Qwen3-ASR language hint template Replace the instruction-based language hint (modified system/user turns) with the official Qwen3-ASR template that forces the assistant prefix: <|im_start|>assistant\nlanguage {Name}<asr_text> This is a "forced generation" pattern — the model skips language detection entirely and goes straight to transcription after <asr_text>. Matches the reference implementation in qwen_asr/core/vllm_backend. Changes: - prompt.rs: Remove SYSTEM_CONTENT, USER_PREFIX, USER_SUFFIX_* constants. Language hint now appends to assistant prefix instead of modifying system/user turns. System and user turns are identical with or without language hint. - engine.rs: Encode " {name}" (with leading space) to match BPE tokenization of "language English" → [11528, 6364]. - model.rs: Skip <asr_text> guard when language is forced (the token is in the prompt, not in generated output).

andrewleech force-pushed the feat/qwen3-batch branch from 7eb5920 to 45edfdb Compare March 4, 2026 10:18

andrewleech mentioned this pull request Mar 4, 2026

model: Add Qwen3-ASR batch transcription engine. cjpais/Handy#957

Open

andrewleech force-pushed the feat/qwen3-batch branch from 45edfdb to 55d33b8 Compare March 10, 2026 05:55

andrewleech force-pushed the feat/qwen3-batch branch 2 times, most recently from 20043f8 to 6c61294 Compare March 10, 2026 09:40

andrewleech force-pushed the feat/qwen3-batch branch from 6c61294 to 20c406d Compare March 10, 2026 12:09

xkcoding mentioned this pull request Mar 11, 2026

feat: add Qwen3-ASR engine for multi-language speech recognition #46

Closed

andrewleech force-pushed the feat/qwen3-batch branch 5 times, most recently from 1789c06 to 52f49e6 Compare March 18, 2026 12:05

andrewleech force-pushed the feat/qwen3-batch branch from 52f49e6 to d4530fe Compare March 18, 2026 12:09

andrewleech force-pushed the feat/qwen3-batch branch 4 times, most recently from 4853d12 to e869323 Compare March 22, 2026 23:53

andrewleech force-pushed the feat/qwen3-batch branch 2 times, most recently from 5c30fb1 to 4f26ecb Compare March 30, 2026 22:53

andrewleech force-pushed the feat/qwen3-batch branch from b850cc7 to de179e2 Compare April 6, 2026 20:49

pi-anl and others added 6 commits April 7, 2026 09:23

feat: add Int4 quantization variant for MatMulNBits models

a609492

Add Quantization::Int4 to the enum and wire it through model path resolution, bench_compare parsing, and moonshine streaming. Add integration tests for 1.7B int4 and 0.6B int4 with FP16 embed.

minor tweak

93f2671

chore: fix clippy warnings

7cf60d8

- Use div_ceil() instead of manual ceiling division (mel.rs) - Remove needless borrow (canary/decoder.rs) - Use range contains() instead of manual comparison (moonshine/model.rs) - Derive Default instead of manual impl (moonshine/mod.rs)

andrewleech force-pushed the feat/qwen3-batch branch from de179e2 to 838181f Compare April 6, 2026 23:24

Conversation

andrewleech commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Engine details

Pre-exported ONNX models

Uh oh!

andrewleech commented Mar 4, 2026

Uh oh!

cjpais commented Mar 4, 2026

Uh oh!

cjpais commented Mar 7, 2026

Uh oh!

andrewleech commented Mar 10, 2026

Uh oh!

cjpais commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewleech commented Mar 10, 2026

Uh oh!

xkcoding commented Mar 11, 2026

Uh oh!

cjpais commented Mar 11, 2026

Uh oh!

cjpais commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewleech commented Mar 18, 2026

Update: branch rewritten onto v0.3.2

What changed since the initial push

Model file structure

Model refinement

Recommended model variants

Model downloads

Uh oh!

cjpais commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewleech commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cjpais commented Mar 18, 2026

Uh oh!

andrewleech commented Mar 18, 2026

Uh oh!

cjpais commented Mar 18, 2026

Uh oh!

andrewleech commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewleech commented Mar 18, 2026

Uh oh!

cjpais commented Mar 18, 2026

Uh oh!

andrewleech commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cjpais commented Mar 23, 2026

Uh oh!

cjpais commented Mar 28, 2026

Uh oh!

andrewleech commented Mar 29, 2026

Uh oh!

wangwillian0 commented Apr 4, 2026

Uh oh!

andrewleech commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

andrewleech commented Mar 4, 2026 •

edited

Loading

cjpais commented Mar 10, 2026 •

edited

Loading

cjpais commented Mar 17, 2026 •

edited

Loading

cjpais commented Mar 18, 2026 •

edited

Loading

andrewleech commented Mar 18, 2026 •

edited

Loading

andrewleech commented Mar 18, 2026 •

edited

Loading

andrewleech commented Mar 23, 2026 •

edited

Loading