Skip to content

adds cohere-transcribe INT4/INT8 via onnx runtime #75

Merged
cjpais merged 7 commits intocjpais:mainfrom
praxeo:cohere-onnx-int4
Apr 1, 2026
Merged

adds cohere-transcribe INT4/INT8 via onnx runtime #75
cjpais merged 7 commits intocjpais:mainfrom
praxeo:cohere-onnx-int4

Conversation

@praxeo
Copy link
Copy Markdown
Contributor

@praxeo praxeo commented Apr 1, 2026

cstr/cohere-transcribe-onnx-int4
CPU inference only

@praxeo
Copy link
Copy Markdown
Contributor Author

praxeo commented Apr 1, 2026

Tested end-to-end on Windows 11 with RTX 3090, ONNX Runtime 2.0.0-rc.12. Model loads and transcribes correctly on CPU. DirectML fails with inference error: Non-zero status code returned while running Reshape node. Name:'node_view_332' — INT4 weight-only quantization is not compatible with the DirectML execution provider. An FP16 or INT8 export would be needed for GPU acceleration. I have not yet tested these because my use case is CPU only at this time. Given the quality of the model to its size, it's probably an investment worth making.

@cjpais
Copy link
Copy Markdown
Owner

cjpais commented Apr 1, 2026

Can you provide the onnx download links you used?

I will pull it in as soon as I can test

@praxeo
Copy link
Copy Markdown
Contributor Author

praxeo commented Apr 1, 2026

@cjpais
Copy link
Copy Markdown
Owner

cjpais commented Apr 1, 2026

Thank you! Hope to get this merged in a few hours

@cjpais
Copy link
Copy Markdown
Owner

cjpais commented Apr 1, 2026

@cjpais cjpais changed the title adds cohere-transcribe INT4 via onnx runtime adds cohere-transcribe INT4/INT8 via onnx runtime Apr 1, 2026
@cjpais cjpais merged commit 2d7ac18 into cjpais:main Apr 1, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants