Skip to content

[Feature Request] Add CTranslate2 backend via ct2rs for much faster Whisper transcription w/ CUDA support #10

@rodalpho

Description

@rodalpho

Summary

This is a feature request to add CTranslate2-based Whisper transcription using the ct2rs Rust crate, which would provide significantly faster performance (10-40x real-time) compared to the current whisper.cpp implementation, especially on NVIDIA GPUs with CUDA.

Motivation

Currently, Handy uses whisper.cpp via transcribe-rs, which provides good cross-platform compatibility with Vulkan support. However, on NVIDIA hardware transcription performance is suboptimal.

While Parakeet offers excellent CPU performance, some users find Whisper models provide better transcription quality for their use cases.

Related Discussion

This builds on the discussion in #58, where @cjpais stated:

"This has been discussed before, we will not use fasterwhisper at the moment. If there are nice versions of whisper with Ctranslate2 that have rust bindings I will consider it."

Good news: Such bindings exist! 🎉

Proposed Solution: ct2rs

ct2rs is a production-ready Rust crate that provides native bindings to CTranslate2 (the same engine that powers Python's faster-whisper).

Key Features

  • Rust-native - No Python dependency required
  • Whisper support - Via whisper feature flag
  • CUDA acceleration - For NVIDIA GPUs
  • ROCm support - For AMD GPUs
  • Same model format - Compatible with faster-whisper models
  • Actively maintained - Latest version 0.9.10 (MIT licensed)
  • Production-ready - Used in several projects

Resources:

Performance Benefits

CTranslate2 provides significant performance improvements:

  • 4x faster than original OpenAI Whisper implementation
  • Better GPU utilization with optimized CUDA/cuBLAS kernels
  • Lower memory usage with INT8/FP16 quantization support
  • Optimized for inference - Purpose-built for production transcription

Benchmark comparison (from this gist):

  • whisper.cpp (CUDA): ~30-124 seconds for ~2min audio
  • CTranslate2 (CUDA float16): ~2.5-12.9 seconds for same audio

References:
Issue #58: cjpais/Handy#58
CTranslate2 docs: https://opennmt.net/CTranslate2/
faster-whisper (Python): https://github.com/SYSTRAN/faster-whisper

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions