Skip to content

rookiestar28/ComfyUI-TranslateGemma

Repository files navigation

ComfyUI-TranslateGemma

A ComfyUI integration for TranslateGemma — Google's new open source translation model family built on Gemma 3. It supports 55 languages, multimodal image-to-text translation, and efficient inference from mobile (4B), and local (12B) to cloud (27B).

TranslateGemma

TranslateGemma: A new suite of open translation models


01/2026 update:

  • Added chinese_conversion_only + chinese_conversion_direction for fast Simplified↔Traditional conversion via OpenCC (no model load).
  • Added max_new_tokens=0 / max_input_tokens=0 as Auto token budgeting (context-aware).
  • Added long_text_strategy (disable / auto-continue / segmented) to mitigate “early stop” on long documents.
  • Added optional BitsAndBytes quantization (quantization: none / bnb-8bit / bnb-4bit) for best-effort VRAM reduction.
  • Added a node UI ? help modal and 0 = auto labels for max token widgets.
  • Improved Hugging Face download diagnostics (network/auth/disk hints + retries) and added troubleshooting guidance (proxy/mirror/offline).

Table of Contents

Features

  • Text translation across 55 languages
  • Model size selection: 4B / 12B / 27B
  • First-run auto download via Hugging Face (requires accepting Gemma terms)
  • Flexible inputs: built-in text box + external string input
  • Optional image input: translate text found in images (multimodal)
  • Chinese conversion-only mode: Simplified↔Traditional conversion via OpenCC without loading the model (TG-038)

Installation

Option A: ComfyUI-Manager

  1. Open ComfyUI-Manager.
  2. Search for TranslateGemma.
  3. Install and restart ComfyUI.

Option B: Manual

  1. Clone into your ComfyUI custom_nodes directory (from your ComfyUI root):
cd custom_nodes
git clone https://github.com/rookiestar28/ComfyUI-TranslateGemma.git
  1. Install dependencies:
cd ComfyUI-TranslateGemma
pip install -r requirements.txt
  1. Restart ComfyUI.

Hugging Face Access (Gated Models)

TranslateGemma repos are gated under the Gemma terms.

  1. Visit the model page and accept the license terms:
  • google/translategemma-4b-it
  • google/translategemma-12b-it
  • google/translategemma-27b-it
  1. Authenticate (recommended):
hf auth login

Alternatively, set one of these environment variables for the ComfyUI process:

  • HF_TOKEN
  • HUGGINGFACE_HUB_TOKEN
  1. Restart ComfyUI after changing authentication.

Download Troubleshooting (Hugging Face)

If the model download stalls at Fetching ... or fails with connection errors, it is usually not a node bug. Common causes: unstable network, corporate firewall/proxy, DNS issues, or regions where huggingface.co is blocked (some China networks).

Things to try:

  • Retry: downloads are resumable; restarting ComfyUI often continues where it left off.
  • Proxy: set HTTP_PROXY / HTTPS_PROXY for the ComfyUI process.
  • Mirror endpoint (community): set HF_ENDPOINT (or HUGGINGFACE_HUB_ENDPOINT) to a mirror URL, then restart ComfyUI.
  • Offline/manual: download the model on a machine that can reach Hugging Face, then copy the downloaded folder into the model cache directory (see Model Storage Location below) and restart ComfyUI.

Notes:

  • Community mirrors are not official; availability and correctness are not guaranteed.
  • If you see 401/403/gated/forbidden, you likely need to accept the license and/or set HF_TOKEN.

Model Storage Location

Models are stored under ComfyUI's models directory in a per-repo folder:

  • Preferred: ComfyUI/models/LLM/TranslateGemma/<repo_name>/

  • Fallback (legacy): ComfyUI/models/translate_gemma/<repo_name>/

  • ComfyUI/models/LLM/TranslateGemma/translategemma-4b-it/

  • ComfyUI/models/LLM/TranslateGemma/translategemma-12b-it/

  • ComfyUI/models/LLM/TranslateGemma/translategemma-27b-it/

Manual / Offline Download (Recommended for Restricted Networks)

Yes — you can manually download the model files and place them into the folders above. This is useful if auto-download is slow/unreliable due to network restrictions (e.g. firewall/proxy, unstable DNS, or regions where huggingface.co is blocked).

What to do:

  1. On a machine that can access Hugging Face, download the entire model repo snapshot (all files).
  2. Copy the downloaded folder into your ComfyUI models path, for example:
ComfyUI/
  models/
    LLM/
      TranslateGemma/
        translategemma-4b-it/
          config.json
          generation_config.json
          model.safetensors.index.json
          *.safetensors (or pytorch_model*.bin)
          tokenizer_config.json
          special_tokens_map.json
          processor_config.json
          preprocessor_config.json
          chat_template.jinja (if present)
          ... (other files from the repo)
  1. Restart ComfyUI. The node will load from disk and skip downloading if the snapshot is complete.

Notes:

  • Gated models still require accepting the Gemma/TranslateGemma terms on Hugging Face (do this on the download machine).
  • If you copy an incomplete folder, the node may attempt to resume/download missing files when network allows.

Node: TranslateGemma

Category: text/translation

Inputs

Name Type Description
text STRING Built-in text input (multiline). Ignored when external_text is connected. Empty/whitespace returns empty output.
external_text STRING When connected, overrides text (even if empty). Intended for chaining from other nodes.
image IMAGE If connected, uses multimodal path to translate text from the image. Requires explicit source_language (Auto Detect is not supported for images).
image_enhance BOOLEAN Mild contrast/sharpening to help small text visibility; may introduce artifacts (default: false).
image_resize_mode COMBO letterbox (preserve aspect ratio, recommended) / processor (official resize, may stretch) / stretch (force 896×896, may distort). Default: letterbox.
image_two_pass BOOLEAN Extract text from image first, then translate extracted text (more accurate, slower). Default: true.
target_language COMBO Translation target language. Does not affect chinese_conversion_only=true.
source_language COMBO Auto Detect is supported for text only. Images require explicit source language. Default: Auto Detect.
model_size COMBO 4B (fastest) / 12B / 27B trade-off (speed vs quality vs VRAM). Gated repos require HF authentication. See VRAM Notes below for rough estimates.
prompt_mode COMBO auto (structured first, fallback to plain) / structured (fail if unavailable) / plain (instruction only). Default: auto.
max_new_tokens INT Maximum output tokens. 0 = Auto (based on input length and remaining context budget). Also clamped by the model context window. Default: 512.
max_input_tokens INT Input truncation limit. 0 = Auto (reserve room for output within context). Too low can break multimodal inputs/templates. Default: 2048.
truncate_input BOOLEAN Truncate input if it exceeds max_input_tokens. Disable may cause OOM. Default: true.
strict_context_limit BOOLEAN Clamp output so input+output stays within model context window. Default: true.
keep_model_loaded BOOLEAN Keep model in memory for faster repeated use; may keep VRAM allocated. Default: true.
debug BOOLEAN Enable debug logging. Sensitive data redacted by default; set TRANSLATEGEMMA_VERBOSE_DEBUG=1 for full details. Default: false.
chinese_conversion_only BOOLEAN OpenCC conversion only (Simplified↔Traditional) without loading the model. Text-only; image not supported. Default: false.
chinese_conversion_direction COMBO auto_flip (detect and flip variant) / to_traditional (force s→t) / to_simplified (force t→s). Default: auto_flip.
long_text_strategy COMBO disable (default single-call) / auto-continue (continue if model stops early) / segmented (paragraph-by-paragraph). Default: disable.
quantization COMBO Best-effort VRAM reduction via bitsandbytes (TG-014). none (default) / bnb-8bit (~50% VRAM reduction) / bnb-4bit (~75% VRAM reduction). Requires CUDA + bitsandbytes installed.

Outputs

Name Type Description
translated_text STRING Translated text

Usage Notes

Text: Auto Detect

TranslateGemma's official chat template requires an explicit source_lang_code. When source_language=Auto Detect, this node performs a best-effort local detection for text inputs. If you see wrong-language behavior, pick the source_language explicitly.

Image Translation Requires Source Language

For images, source_language=Auto Detect is not supported (no OCR pre-pass). Select the correct source_language.

Image Preprocessing (896×896)

For image translation, the node supports multiple preprocessing modes via image_resize_mode:

  • letterbox (default): preserve aspect ratio (no stretching) by padding, then resize
  • processor: rely on the official Gemma3 image processor resize to 896×896 (may stretch)
  • stretch: force resize to 896×896 (may distort)

If small text is missed, try enabling image_enhance=true to apply mild pixel-only enhancement (TG-037).

Enhancement tuning (experimental):

  • TRANSLATEGEMMA_IMAGE_ENHANCE_MODE: gentle (default) or legacy
  • TRANSLATEGEMMA_IMAGE_ENHANCE_CONTRAST: contrast factor (default 1.10)
  • TRANSLATEGEMMA_IMAGE_ENHANCE_SHARPNESS: sharpness factor (default 1.10)
  • TRANSLATEGEMMA_AUTO_MAX_NEW_TOKENS_MAX: optional hard cap for max_new_tokens=0 (Auto) to limit long-form outputs. If unset, Auto is only limited by context budget + other safeguards.

When debug=true, the node prints the path of the preprocessed temporary PNG and keeps it for inspection.

Additionally, when debug=true, the node saves intermediate images under debug/:

  • resize_mode + enhance_mode prefixed files
  • both the resize-only and enhance-applied variants (when enabled)

Note: For image translation, max_input_tokens values that are too small can truncate the model’s visual tokens and cause unrelated outputs. The node enforces a safe minimum when truncation is enabled.

Notes on Chinese Variants

For better Traditional Chinese output consistency, the node maps:

  • Chinese (Simplified) -> zh
  • Chinese (Traditional) -> zh-Hant

When source_language=Auto Detect, the node will try to distinguish Simplified vs Traditional Chinese:

  • Region hints (when available): zh_TW/zh_HK/zh_MO -> zh_Hant, zh_CN/zh_SG/zh_MY -> zh
  • Character-variant heuristic: counts common simplified/traditional characters and picks zh_Hant only when the signal is strong

If the text is too short or ambiguous, Auto Detect may still resolve to zh. For guaranteed behavior, select the desired source_language explicitly.

Tip: If your input is Simplified Chinese but you want Traditional output, set source_language=Auto Detect (or Chinese (Simplified)) and target_language=Chinese (Traditional).

If you still see mixed Simplified/Traditional output when targeting Traditional Chinese, you can enable a best-effort post-edit conversion using OpenCC:

  • Install: pip install opencc-python-reimplemented
  • Default behavior: when target_language=Chinese (Traditional) the node will convert Simplified → Traditional if OpenCC is available
  • Disable: set TRANSLATEGEMMA_TRADITIONAL_POSTEDIT=0

Chinese Conversion-Only Mode (TG-038)

For workflows that only need script conversion (Simplified ↔ Traditional) without translation, enable chinese_conversion_only=true. This mode:

  • Uses OpenCC for fast, deterministic conversion
  • Does not load any translation model (no GPU/VRAM required)
  • Returns converted text immediately with minimal latency
  • Does not require target_language to be a Chinese variant (direction is controlled separately)

Direction selector (chinese_conversion_direction):

  • auto_flip (default): Auto-detect input variant and convert to the opposite script
    • Input Simplified → output Traditional
    • Input Traditional → output Simplified
    • Returns an error if input is ambiguous (ask user to force direction)
  • to_traditional: Force Simplified → Traditional (s2t)
  • to_simplified: Force Traditional → Simplified (t2s)

Requirements:

  • Install OpenCC: pip install opencc-python-reimplemented

Limitations:

  • Text-only: if image is connected, returns an error (use normal translation mode for images)
  • No cross-language translation (e.g., English → Chinese still requires the model)
  • auto_flip may fail on short/ambiguous inputs; use forced direction in those cases

When to use:

  • You have Chinese text and only need to change the script variant
  • You want to avoid model download/load overhead
  • You need deterministic, reproducible output (no LLM randomness)

Long Text Strategy (TG-050)

For long texts, the model may stop early (emitting <end_of_turn>) before completing the translation. The long_text_strategy option provides two approaches:

disable (default): Standard single-call behavior. Suitable for most inputs.

auto-continue (also accepts auto_continue): Best-effort continuation when the model stops early on long input.

  • Only triggers when: input is long (≥512 tokens), model stopped via <end_of_turn>, and input was not truncated.
  • Prompts the model to continue the translation (up to 2 additional rounds).
  • Uses overlap trimming to avoid duplicated text at continuation boundaries.
  • Trade-off: may increase latency (2–3× model calls), but improves completeness for long texts.

segmented: Translate paragraph-by-paragraph.

  • Splits input by blank lines (preserves original separators).
  • Translates each paragraph in a separate model call.
  • Reassembles with original blank lines preserved.
  • Trade-off: slower (N model calls for N paragraphs), but handles very long documents and preserves paragraph structure.

When to use:

Scenario Recommended
Short/medium text (<2000 chars) disable
Long text that sometimes truncates early auto-continue
Very long document with many paragraphs segmented
Speed is critical disable

Recommended settings for long documents:

  • Set max_input_tokens=0 and max_new_tokens=0 (Auto) so the node stays context-aware.
  • If you see early stops with incomplete output: try long_text_strategy=auto-continue.
  • For very long documents or many paragraphs: try long_text_strategy=segmented (more robust, but slower).

Limitations:

  • Text-only for v1 (image path not affected).
  • segmented mode has higher latency for many-paragraph documents.
  • auto-continue continuation quality depends on model; may occasionally repeat or diverge.

Language Code Normalization

The node accepts both _ and - variants for language codes (e.g., zh_Hant and zh-Hant). Internally, codes are normalized to match the official TranslateGemma template format.

If an unsupported language is passed, the node prints a warning and defaults to English. Set TRANSLATEGEMMA_STRICT_LANG=1 to raise an error instead.

Default Settings (TG-032)

The following are the authoritative default values for node inputs:

Setting Default Notes
model_size 4B Smallest, fastest
max_new_tokens 512 Use 0 for auto-sizing
max_input_tokens 2048 Input truncation limit (0 = Auto)
keep_model_loaded true Avoids reload overhead
truncate_input true Prevents OOM on long texts
debug false Enable for diagnostics
image_resize_mode letterbox Preserves aspect ratio
image_enhance false Enables contrast/sharpening
image_two_pass true Extract then translate
chinese_conversion_only false OpenCC conversion without model
chinese_conversion_direction auto_flip Auto-detect and flip variant
long_text_strategy disable Single-call (no continuation)
quantization none No quantization (full precision)

Performance Tips

  • Leave keep_model_loaded=true for repeated use (avoids reload time).
  • Use the 4B model if you are unsure about hardware limits.
  • First run is slower due to download and weight initialization.

VRAM Notes (Native Models)

  • Rough starting point (varies by GPU, dtype, drivers, and context length):
    • 4B model: ~12 GB
    • 12B model: ~27 GB
    • 27B model: ~56 GB

Quantization (bitsandbytes) — TG-014

Best-effort VRAM reduction for running larger models (12B/27B) on consumer GPUs.

How It Works

The quantization input allows you to load the model in lower precision using bitsandbytes:

Mode VRAM Reduction Quality Notes
none (default) Best Full precision (BF16/FP16)
bnb-8bit ~50% Good 8-bit quantization
bnb-4bit ~75% Acceptable 4-bit NF4 quantization

Requirements

  • CUDA GPU: bitsandbytes quantization only works on NVIDIA GPUs with CUDA
  • bitsandbytes installed: pip install bitsandbytes
  • transformers with BitsAndBytesConfig: pip install --upgrade transformers

Troubleshooting

"bitsandbytes quantization requires a CUDA GPU":

  • You're running on CPU or MPS (Apple Silicon)
  • Set quantization=none or use a CUDA-capable GPU

"bitsandbytes not installed":

  • Install: pip install bitsandbytes
  • Windows users: see bitsandbytes-windows-webui for prebuilt wheels (third-party, evaluate risk yourself)
  • ComfyUI Desktop users: quantization may require manual bitsandbytes installation; if install fails, use quantization=none or run the 4B model

"BitsAndBytesConfig not found":

  • Upgrade transformers: pip install --upgrade transformers

"CUDA Setup failed" or "libbitsandbytes_cudaXXX not found" (import succeeds but loading fails):

  • This means bitsandbytes was built for a different CUDA version or your GPU's compute capability is unsupported
  • Set quantization=none as a workaround
  • Include the full error message when reporting issues

Environment Variables (Advanced)

  • TRANSLATEGEMMA_BNB_4BIT_COMPUTE_DTYPE: Force compute dtype for 4-bit (bf16 or fp16). Default: auto-detect.
  • TRANSLATEGEMMA_BNB_4BIT_DOUBLE_QUANT: Enable double quantization (1 = enabled, 0 = disabled). Default: 1.

Limitations

  • Quantization is best-effort: TranslateGemma official docs do not explicitly promise bitsandbytes support
  • Translation quality may degrade slightly with quantization
  • Not supported on CPU or MPS (Apple Silicon) — only CUDA GPUs
  • Windows/Desktop users may encounter install issues with bitsandbytes

Security / Reproducibility Notes

Remote Code Policy (TG-026)

  • The loader attempts trust_remote_code=False first and only falls back to True if required.
  • Set TRANSLATEGEMMA_ALLOW_REMOTE_CODE=0 to deny remote code entirely (fails if code is needed).
  • Set TRANSLATEGEMMA_REMOTE_CODE_ALLOWLIST=google/translategemma-4b-it,google/translategemma-12b-it to allow only specific repos.

Revision Pinning

  • You can pin a specific revision for reproducibility via TRANSLATEGEMMA_REVISION=<commit-or-tag>.

Debug Privacy (TG-028)

  • By default, debug=true redacts sensitive data (user text content, full filesystem paths).
  • Set TRANSLATEGEMMA_VERBOSE_DEBUG=1 to enable full diagnostics (for troubleshooting).

Download Recovery

  • If a download is interrupted, the loader auto-resumes on next run.
  • If corruption persists, delete the model folder under ComfyUI/models/LLM/TranslateGemma/ and retry.

License

This repository is licensed under the MIT License (see LICENSE). TranslateGemma model weights are governed by Google's Gemma Terms of Use.

About

A ComfyUI node for TranslateGemma — Google's open source translation models with 55 languages & multimodal image translation.

Topics

Resources

License

Stars

Watchers

Forks

Packages