Summary
This is a feature request to add CTranslate2-based Whisper transcription using the ct2rs Rust crate, which would provide significantly faster performance (10-40x real-time) compared to the current whisper.cpp implementation, especially on NVIDIA GPUs with CUDA.
Motivation
Currently, Handy uses whisper.cpp via transcribe-rs, which provides good cross-platform compatibility with Vulkan support. However, on NVIDIA hardware transcription performance is suboptimal.
While Parakeet offers excellent CPU performance, some users find Whisper models provide better transcription quality for their use cases.
Related Discussion
This builds on the discussion in #58, where @cjpais stated:
"This has been discussed before, we will not use fasterwhisper at the moment. If there are nice versions of whisper with Ctranslate2 that have rust bindings I will consider it."
Good news: Such bindings exist! 🎉
Proposed Solution: ct2rs
ct2rs is a production-ready Rust crate that provides native bindings to CTranslate2 (the same engine that powers Python's faster-whisper).
Key Features
- ✅ Rust-native - No Python dependency required
- ✅ Whisper support - Via
whisper feature flag
- ✅ CUDA acceleration - For NVIDIA GPUs
- ✅ ROCm support - For AMD GPUs
- ✅ Same model format - Compatible with faster-whisper models
- ✅ Actively maintained - Latest version 0.9.10 (MIT licensed)
- ✅ Production-ready - Used in several projects
Resources:
Performance Benefits
CTranslate2 provides significant performance improvements:
- 4x faster than original OpenAI Whisper implementation
- Better GPU utilization with optimized CUDA/cuBLAS kernels
- Lower memory usage with INT8/FP16 quantization support
- Optimized for inference - Purpose-built for production transcription
Benchmark comparison (from this gist):
- whisper.cpp (CUDA): ~30-124 seconds for ~2min audio
- CTranslate2 (CUDA float16): ~2.5-12.9 seconds for same audio
References:
Issue #58: cjpais/Handy#58
CTranslate2 docs: https://opennmt.net/CTranslate2/
faster-whisper (Python): https://github.com/SYSTRAN/faster-whisper
Summary
This is a feature request to add CTranslate2-based Whisper transcription using the
ct2rsRust crate, which would provide significantly faster performance (10-40x real-time) compared to the current whisper.cpp implementation, especially on NVIDIA GPUs with CUDA.Motivation
Currently, Handy uses whisper.cpp via transcribe-rs, which provides good cross-platform compatibility with Vulkan support. However, on NVIDIA hardware transcription performance is suboptimal.
While Parakeet offers excellent CPU performance, some users find Whisper models provide better transcription quality for their use cases.
Related Discussion
This builds on the discussion in #58, where @cjpais stated:
Good news: Such bindings exist! 🎉
Proposed Solution:
ct2rsct2rs is a production-ready Rust crate that provides native bindings to CTranslate2 (the same engine that powers Python's faster-whisper).
Key Features
whisperfeature flagResources:
Performance Benefits
CTranslate2 provides significant performance improvements:
Benchmark comparison (from this gist):
References:
Issue #58: cjpais/Handy#58
CTranslate2 docs: https://opennmt.net/CTranslate2/
faster-whisper (Python): https://github.com/SYSTRAN/faster-whisper