adds cohere-transcribe INT4/INT8 via onnx runtime by praxeo · Pull Request #75 · cjpais/transcribe-rs

praxeo · 2026-04-01T01:51:28Z

cstr/cohere-transcribe-onnx-int4
CPU inference only

praxeo · 2026-04-01T02:17:14Z

Tested end-to-end on Windows 11 with RTX 3090, ONNX Runtime 2.0.0-rc.12. Model loads and transcribes correctly on CPU. DirectML fails with inference error: Non-zero status code returned while running Reshape node. Name:'node_view_332' — INT4 weight-only quantization is not compatible with the DirectML execution provider. An FP16 or INT8 export would be needed for GPU acceleration. I have not yet tested these because my use case is CPU only at this time. Given the quality of the model to its size, it's probably an investment worth making.

cjpais · 2026-04-01T03:06:14Z

Can you provide the onnx download links you used?

I will pull it in as soon as I can test

praxeo · 2026-04-01T03:55:27Z

Sure, Model files from https://huggingface.co/cstr/cohere-transcribe-onnx-int4

Tarball I packaged for the Handy integration (encoder + decoder + tokens.txt): https://github.com/praxeo/Handy/releases/download/v1.0.0-cohere/cohere-int4.tar.gz

cjpais · 2026-04-01T05:20:52Z

Thank you! Hope to get this merged in a few hours

cjpais · 2026-04-01T09:12:21Z

I believe Handy will ship the int8 variant

int8: https://huggingface.co/tristanripke/cohere-transcribe-onnx-int8
int4: https://huggingface.co/cstr/cohere-transcribe-onnx-int4

praxeo added 2 commits March 30, 2026 22:16

feat: add cohere onnx integration

dc322b0

fix: scope encoder_outputs to release mutable borrow before decoder loop

219e02d

cjpais mentioned this pull request Apr 1, 2026

feat: add Cohere Transcribe ONNX model support #76

Closed

cjpais added 3 commits April 1, 2026 16:04

eliminate per-token KV cache clones in decoder loop

9a7a158

add common greedy decoder, cleanup lib

c48be38

int4 quant in lib

eb181a9

cjpais changed the title ~~adds cohere-transcribe INT4 via onnx runtime~~ adds cohere-transcribe INT4/INT8 via onnx runtime Apr 1, 2026

cjpais added 2 commits April 1, 2026 17:19

minor

3814405

Update README.md

ce87014

cjpais merged commit 2d7ac18 into cjpais:main Apr 1, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds cohere-transcribe INT4/INT8 via onnx runtime #75

adds cohere-transcribe INT4/INT8 via onnx runtime #75
cjpais merged 7 commits intocjpais:mainfrom
praxeo:cohere-onnx-int4

praxeo commented Apr 1, 2026

Uh oh!

praxeo commented Apr 1, 2026

Uh oh!

cjpais commented Apr 1, 2026 •

edited

Loading

Uh oh!

praxeo commented Apr 1, 2026

Uh oh!

cjpais commented Apr 1, 2026 •

edited

Loading

Uh oh!

cjpais commented Apr 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

praxeo commented Apr 1, 2026

Uh oh!

praxeo commented Apr 1, 2026

Uh oh!

cjpais commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

praxeo commented Apr 1, 2026

Uh oh!

cjpais commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cjpais commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cjpais commented Apr 1, 2026 •

edited

Loading

cjpais commented Apr 1, 2026 •

edited

Loading

cjpais commented Apr 1, 2026 •

edited

Loading