[Feature Request] Add CTranslate2 backend via ct2rs for much faster Whisper transcription w/ CUDA support

## Summary

This is a feature request to add CTranslate2-based Whisper transcription using the `ct2rs` Rust crate, which would provide significantly faster performance (10-40x real-time) compared to the current whisper.cpp implementation, especially on NVIDIA GPUs with CUDA.

## Motivation

Currently, Handy uses whisper.cpp via transcribe-rs, which provides good cross-platform compatibility with Vulkan support. However, on NVIDIA hardware transcription performance is suboptimal.

While Parakeet offers excellent CPU performance, some users find Whisper models provide better transcription quality for their use cases.

## Related Discussion

This builds on the discussion in #58, where @cjpais stated:

> "This has been discussed before, we will not use fasterwhisper at the moment. If there are nice versions of whisper with Ctranslate2 that have rust bindings I will consider it."

**Good news: Such bindings exist!** 🎉

## Proposed Solution: `ct2rs`

[**ct2rs**](https://github.com/jkawamoto/ctranslate2-rs) is a production-ready Rust crate that provides native bindings to CTranslate2 (the same engine that powers Python's faster-whisper).

### Key Features

- ✅ **Rust-native** - No Python dependency required
- ✅ **Whisper support** - Via `whisper` feature flag
- ✅ **CUDA acceleration** - For NVIDIA GPUs
- ✅ **ROCm support** - For AMD GPUs
- ✅ **Same model format** - Compatible with faster-whisper models
- ✅ **Actively maintained** - Latest version 0.9.10 (MIT licensed)
- ✅ **Production-ready** - Used in several projects

**Resources:**
- Crate: https://crates.io/crates/ct2rs
- GitHub: https://github.com/jkawamoto/ctranslate2-rs
- Docs: https://docs.rs/ct2rs/latest/ct2rs/

### Performance Benefits

CTranslate2 provides significant performance improvements:
- **4x faster** than original OpenAI Whisper implementation
- **Better GPU utilization** with optimized CUDA/cuBLAS kernels
- **Lower memory usage** with INT8/FP16 quantization support
- **Optimized for inference** - Purpose-built for production transcription

Benchmark comparison (from [this gist](https://gist.github.com/geekodour/8734b3bf22b8ede61fb5bfc92ce68fe3)):
- whisper.cpp (CUDA): ~30-124 seconds for ~2min audio
- CTranslate2 (CUDA float16): ~2.5-12.9 seconds for same audio

References:
Issue #58: https://github.com/cjpais/Handy/issues/58
CTranslate2 docs: https://opennmt.net/CTranslate2/
faster-whisper (Python): https://github.com/SYSTRAN/faster-whisper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add CTranslate2 backend via ct2rs for much faster Whisper transcription w/ CUDA support #10

Summary

Motivation

Related Discussion

Proposed Solution: `ct2rs`

Key Features

Performance Benefits

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature Request] Add CTranslate2 backend via ct2rs for much faster Whisper transcription w/ CUDA support #10

Description

Summary

Motivation

Related Discussion

Proposed Solution: ct2rs

Key Features

Performance Benefits

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Proposed Solution: `ct2rs`