A ComfyUI integration for TranslateGemma — Google's new open source translation model family built on Gemma 3. It supports 55 languages, multimodal image-to-text translation, and efficient inference from mobile (4B), and local (12B) to cloud (27B).
TranslateGemma: A new suite of open translation models
01/2026 update:
- Added
chinese_conversion_only+chinese_conversion_directionfor fast Simplified↔Traditional conversion via OpenCC (no model load). - Added
max_new_tokens=0/max_input_tokens=0as Auto token budgeting (context-aware). - Added
long_text_strategy(disable/auto-continue/segmented) to mitigate “early stop” on long documents. - Added optional BitsAndBytes quantization (
quantization:none/bnb-8bit/bnb-4bit) for best-effort VRAM reduction. - Added a node UI
?help modal and0 = autolabels for max token widgets. - Improved Hugging Face download diagnostics (network/auth/disk hints + retries) and added troubleshooting guidance (proxy/mirror/offline).
- Features
- Installation
- Hugging Face Access (Gated Models)
- Model Storage Location
- Node: TranslateGemma
- Usage Notes
- Default Settings (TG-032)
- Performance Tips
- VRAM Notes (Native Models)
- Quantization (bitsandbytes) — TG-014
- Security / Reproducibility Notes
- License
- Text translation across 55 languages
- Model size selection: 4B / 12B / 27B
- First-run auto download via Hugging Face (requires accepting Gemma terms)
- Flexible inputs: built-in text box + external string input
- Optional image input: translate text found in images (multimodal)
- Chinese conversion-only mode: Simplified↔Traditional conversion via OpenCC without loading the model (TG-038)
- Open ComfyUI-Manager.
- Search for
TranslateGemma. - Install and restart ComfyUI.
- Clone into your ComfyUI
custom_nodesdirectory (from your ComfyUI root):
cd custom_nodes
git clone https://github.com/rookiestar28/ComfyUI-TranslateGemma.git- Install dependencies:
cd ComfyUI-TranslateGemma
pip install -r requirements.txt- Restart ComfyUI.
TranslateGemma repos are gated under the Gemma terms.
- Visit the model page and accept the license terms:
google/translategemma-4b-itgoogle/translategemma-12b-itgoogle/translategemma-27b-it
- Authenticate (recommended):
hf auth loginAlternatively, set one of these environment variables for the ComfyUI process:
HF_TOKENHUGGINGFACE_HUB_TOKEN
- Restart ComfyUI after changing authentication.
If the model download stalls at Fetching ... or fails with connection errors, it is usually not a node bug.
Common causes: unstable network, corporate firewall/proxy, DNS issues, or regions where huggingface.co is blocked
(some China networks).
Things to try:
- Retry: downloads are resumable; restarting ComfyUI often continues where it left off.
- Proxy: set
HTTP_PROXY/HTTPS_PROXYfor the ComfyUI process. - Mirror endpoint (community): set
HF_ENDPOINT(orHUGGINGFACE_HUB_ENDPOINT) to a mirror URL, then restart ComfyUI. - Offline/manual: download the model on a machine that can reach Hugging Face, then copy the downloaded folder into the model cache directory (see Model Storage Location below) and restart ComfyUI.
Notes:
- Community mirrors are not official; availability and correctness are not guaranteed.
- If you see
401/403/gated/forbidden, you likely need to accept the license and/or setHF_TOKEN.
Models are stored under ComfyUI's models directory in a per-repo folder:
-
Preferred:
ComfyUI/models/LLM/TranslateGemma/<repo_name>/ -
Fallback (legacy):
ComfyUI/models/translate_gemma/<repo_name>/ -
ComfyUI/models/LLM/TranslateGemma/translategemma-4b-it/ -
ComfyUI/models/LLM/TranslateGemma/translategemma-12b-it/ -
ComfyUI/models/LLM/TranslateGemma/translategemma-27b-it/
Yes — you can manually download the model files and place them into the folders above.
This is useful if auto-download is slow/unreliable due to network restrictions (e.g. firewall/proxy, unstable DNS, or
regions where huggingface.co is blocked).
What to do:
- On a machine that can access Hugging Face, download the entire model repo snapshot (all files).
- Copy the downloaded folder into your ComfyUI models path, for example:
ComfyUI/
models/
LLM/
TranslateGemma/
translategemma-4b-it/
config.json
generation_config.json
model.safetensors.index.json
*.safetensors (or pytorch_model*.bin)
tokenizer_config.json
special_tokens_map.json
processor_config.json
preprocessor_config.json
chat_template.jinja (if present)
... (other files from the repo)
- Restart ComfyUI. The node will load from disk and skip downloading if the snapshot is complete.
Notes:
- Gated models still require accepting the Gemma/TranslateGemma terms on Hugging Face (do this on the download machine).
- If you copy an incomplete folder, the node may attempt to resume/download missing files when network allows.
Category: text/translation
| Name | Type | Description |
|---|---|---|
text |
STRING | Built-in text input (multiline). Ignored when external_text is connected. Empty/whitespace returns empty output. |
external_text |
STRING | When connected, overrides text (even if empty). Intended for chaining from other nodes. |
image |
IMAGE | If connected, uses multimodal path to translate text from the image. Requires explicit source_language (Auto Detect is not supported for images). |
image_enhance |
BOOLEAN | Mild contrast/sharpening to help small text visibility; may introduce artifacts (default: false). |
image_resize_mode |
COMBO | letterbox (preserve aspect ratio, recommended) / processor (official resize, may stretch) / stretch (force 896×896, may distort). Default: letterbox. |
image_two_pass |
BOOLEAN | Extract text from image first, then translate extracted text (more accurate, slower). Default: true. |
target_language |
COMBO | Translation target language. Does not affect chinese_conversion_only=true. |
source_language |
COMBO | Auto Detect is supported for text only. Images require explicit source language. Default: Auto Detect. |
model_size |
COMBO | 4B (fastest) / 12B / 27B trade-off (speed vs quality vs VRAM). Gated repos require HF authentication. See VRAM Notes below for rough estimates. |
prompt_mode |
COMBO | auto (structured first, fallback to plain) / structured (fail if unavailable) / plain (instruction only). Default: auto. |
max_new_tokens |
INT | Maximum output tokens. 0 = Auto (based on input length and remaining context budget). Also clamped by the model context window. Default: 512. |
max_input_tokens |
INT | Input truncation limit. 0 = Auto (reserve room for output within context). Too low can break multimodal inputs/templates. Default: 2048. |
truncate_input |
BOOLEAN | Truncate input if it exceeds max_input_tokens. Disable may cause OOM. Default: true. |
strict_context_limit |
BOOLEAN | Clamp output so input+output stays within model context window. Default: true. |
keep_model_loaded |
BOOLEAN | Keep model in memory for faster repeated use; may keep VRAM allocated. Default: true. |
debug |
BOOLEAN | Enable debug logging. Sensitive data redacted by default; set TRANSLATEGEMMA_VERBOSE_DEBUG=1 for full details. Default: false. |
chinese_conversion_only |
BOOLEAN | OpenCC conversion only (Simplified↔Traditional) without loading the model. Text-only; image not supported. Default: false. |
chinese_conversion_direction |
COMBO | auto_flip (detect and flip variant) / to_traditional (force s→t) / to_simplified (force t→s). Default: auto_flip. |
long_text_strategy |
COMBO | disable (default single-call) / auto-continue (continue if model stops early) / segmented (paragraph-by-paragraph). Default: disable. |
quantization |
COMBO | Best-effort VRAM reduction via bitsandbytes (TG-014). none (default) / bnb-8bit (~50% VRAM reduction) / bnb-4bit (~75% VRAM reduction). Requires CUDA + bitsandbytes installed. |
| Name | Type | Description |
|---|---|---|
translated_text |
STRING | Translated text |
TranslateGemma's official chat template requires an explicit source_lang_code.
When source_language=Auto Detect, this node performs a best-effort local detection for text inputs.
If you see wrong-language behavior, pick the source_language explicitly.
For images, source_language=Auto Detect is not supported (no OCR pre-pass). Select the correct source_language.
For image translation, the node supports multiple preprocessing modes via image_resize_mode:
letterbox(default): preserve aspect ratio (no stretching) by padding, then resizeprocessor: rely on the official Gemma3 image processor resize to 896×896 (may stretch)stretch: force resize to 896×896 (may distort)
If small text is missed, try enabling image_enhance=true to apply mild pixel-only enhancement (TG-037).
Enhancement tuning (experimental):
TRANSLATEGEMMA_IMAGE_ENHANCE_MODE:gentle(default) orlegacyTRANSLATEGEMMA_IMAGE_ENHANCE_CONTRAST: contrast factor (default1.10)TRANSLATEGEMMA_IMAGE_ENHANCE_SHARPNESS: sharpness factor (default1.10)TRANSLATEGEMMA_AUTO_MAX_NEW_TOKENS_MAX: optional hard cap formax_new_tokens=0(Auto) to limit long-form outputs. If unset, Auto is only limited by context budget + other safeguards.
When debug=true, the node prints the path of the preprocessed temporary PNG and keeps it for inspection.
Additionally, when debug=true, the node saves intermediate images under debug/:
resize_mode+enhance_modeprefixed files- both the resize-only and enhance-applied variants (when enabled)
Note: For image translation, max_input_tokens values that are too small can truncate the model’s visual tokens and cause unrelated outputs. The node enforces a safe minimum when truncation is enabled.
For better Traditional Chinese output consistency, the node maps:
- Chinese (Simplified) ->
zh - Chinese (Traditional) ->
zh-Hant
When source_language=Auto Detect, the node will try to distinguish Simplified vs Traditional Chinese:
- Region hints (when available):
zh_TW/zh_HK/zh_MO->zh_Hant,zh_CN/zh_SG/zh_MY->zh - Character-variant heuristic: counts common simplified/traditional characters and picks
zh_Hantonly when the signal is strong
If the text is too short or ambiguous, Auto Detect may still resolve to zh. For guaranteed behavior, select the desired source_language explicitly.
Tip: If your input is Simplified Chinese but you want Traditional output, set source_language=Auto Detect (or Chinese (Simplified)) and target_language=Chinese (Traditional).
If you still see mixed Simplified/Traditional output when targeting Traditional Chinese, you can enable a best-effort post-edit conversion using OpenCC:
- Install:
pip install opencc-python-reimplemented - Default behavior: when
target_language=Chinese (Traditional)the node will convert Simplified → Traditional if OpenCC is available - Disable: set
TRANSLATEGEMMA_TRADITIONAL_POSTEDIT=0
For workflows that only need script conversion (Simplified ↔ Traditional) without translation, enable chinese_conversion_only=true. This mode:
- Uses OpenCC for fast, deterministic conversion
- Does not load any translation model (no GPU/VRAM required)
- Returns converted text immediately with minimal latency
- Does not require
target_languageto be a Chinese variant (direction is controlled separately)
Direction selector (chinese_conversion_direction):
auto_flip(default): Auto-detect input variant and convert to the opposite script- Input Simplified → output Traditional
- Input Traditional → output Simplified
- Returns an error if input is ambiguous (ask user to force direction)
to_traditional: Force Simplified → Traditional (s2t)to_simplified: Force Traditional → Simplified (t2s)
Requirements:
- Install OpenCC:
pip install opencc-python-reimplemented
Limitations:
- Text-only: if
imageis connected, returns an error (use normal translation mode for images) - No cross-language translation (e.g., English → Chinese still requires the model)
auto_flipmay fail on short/ambiguous inputs; use forced direction in those cases
When to use:
- You have Chinese text and only need to change the script variant
- You want to avoid model download/load overhead
- You need deterministic, reproducible output (no LLM randomness)
For long texts, the model may stop early (emitting <end_of_turn>) before completing the translation. The long_text_strategy option provides two approaches:
disable (default): Standard single-call behavior. Suitable for most inputs.
auto-continue (also accepts auto_continue): Best-effort continuation when the model stops early on long input.
- Only triggers when: input is long (≥512 tokens), model stopped via
<end_of_turn>, and input was not truncated. - Prompts the model to continue the translation (up to 2 additional rounds).
- Uses overlap trimming to avoid duplicated text at continuation boundaries.
- Trade-off: may increase latency (2–3× model calls), but improves completeness for long texts.
segmented: Translate paragraph-by-paragraph.
- Splits input by blank lines (preserves original separators).
- Translates each paragraph in a separate model call.
- Reassembles with original blank lines preserved.
- Trade-off: slower (N model calls for N paragraphs), but handles very long documents and preserves paragraph structure.
When to use:
| Scenario | Recommended |
|---|---|
| Short/medium text (<2000 chars) | disable |
| Long text that sometimes truncates early | auto-continue |
| Very long document with many paragraphs | segmented |
| Speed is critical | disable |
Recommended settings for long documents:
- Set
max_input_tokens=0andmax_new_tokens=0(Auto) so the node stays context-aware. - If you see early stops with incomplete output: try
long_text_strategy=auto-continue. - For very long documents or many paragraphs: try
long_text_strategy=segmented(more robust, but slower).
Limitations:
- Text-only for v1 (image path not affected).
segmentedmode has higher latency for many-paragraph documents.auto-continuecontinuation quality depends on model; may occasionally repeat or diverge.
The node accepts both _ and - variants for language codes (e.g., zh_Hant and zh-Hant). Internally, codes are normalized to match the official TranslateGemma template format.
If an unsupported language is passed, the node prints a warning and defaults to English. Set TRANSLATEGEMMA_STRICT_LANG=1 to raise an error instead.
The following are the authoritative default values for node inputs:
| Setting | Default | Notes |
|---|---|---|
model_size |
4B |
Smallest, fastest |
max_new_tokens |
512 |
Use 0 for auto-sizing |
max_input_tokens |
2048 |
Input truncation limit (0 = Auto) |
keep_model_loaded |
true |
Avoids reload overhead |
truncate_input |
true |
Prevents OOM on long texts |
debug |
false |
Enable for diagnostics |
image_resize_mode |
letterbox |
Preserves aspect ratio |
image_enhance |
false |
Enables contrast/sharpening |
image_two_pass |
true |
Extract then translate |
chinese_conversion_only |
false |
OpenCC conversion without model |
chinese_conversion_direction |
auto_flip |
Auto-detect and flip variant |
long_text_strategy |
disable |
Single-call (no continuation) |
quantization |
none |
No quantization (full precision) |
- Leave
keep_model_loaded=truefor repeated use (avoids reload time). - Use the 4B model if you are unsure about hardware limits.
- First run is slower due to download and weight initialization.
- Rough starting point (varies by GPU, dtype, drivers, and context length):
- 4B model: ~12 GB
- 12B model: ~27 GB
- 27B model: ~56 GB
Best-effort VRAM reduction for running larger models (12B/27B) on consumer GPUs.
The quantization input allows you to load the model in lower precision using bitsandbytes:
| Mode | VRAM Reduction | Quality | Notes |
|---|---|---|---|
none (default) |
— | Best | Full precision (BF16/FP16) |
bnb-8bit |
~50% | Good | 8-bit quantization |
bnb-4bit |
~75% | Acceptable | 4-bit NF4 quantization |
- CUDA GPU: bitsandbytes quantization only works on NVIDIA GPUs with CUDA
- bitsandbytes installed:
pip install bitsandbytes - transformers with BitsAndBytesConfig:
pip install --upgrade transformers
"bitsandbytes quantization requires a CUDA GPU":
- You're running on CPU or MPS (Apple Silicon)
- Set
quantization=noneor use a CUDA-capable GPU
"bitsandbytes not installed":
- Install:
pip install bitsandbytes - Windows users: see bitsandbytes-windows-webui for prebuilt wheels (third-party, evaluate risk yourself)
- ComfyUI Desktop users: quantization may require manual bitsandbytes installation; if install fails, use
quantization=noneor run the 4B model
"BitsAndBytesConfig not found":
- Upgrade transformers:
pip install --upgrade transformers
"CUDA Setup failed" or "libbitsandbytes_cudaXXX not found" (import succeeds but loading fails):
- This means bitsandbytes was built for a different CUDA version or your GPU's compute capability is unsupported
- Set
quantization=noneas a workaround - Include the full error message when reporting issues
TRANSLATEGEMMA_BNB_4BIT_COMPUTE_DTYPE: Force compute dtype for 4-bit (bf16orfp16). Default: auto-detect.TRANSLATEGEMMA_BNB_4BIT_DOUBLE_QUANT: Enable double quantization (1= enabled,0= disabled). Default:1.
- Quantization is best-effort: TranslateGemma official docs do not explicitly promise bitsandbytes support
- Translation quality may degrade slightly with quantization
- Not supported on CPU or MPS (Apple Silicon) — only CUDA GPUs
- Windows/Desktop users may encounter install issues with bitsandbytes
- The loader attempts
trust_remote_code=Falsefirst and only falls back toTrueif required. - Set
TRANSLATEGEMMA_ALLOW_REMOTE_CODE=0to deny remote code entirely (fails if code is needed). - Set
TRANSLATEGEMMA_REMOTE_CODE_ALLOWLIST=google/translategemma-4b-it,google/translategemma-12b-itto allow only specific repos.
- You can pin a specific revision for reproducibility via
TRANSLATEGEMMA_REVISION=<commit-or-tag>.
- By default,
debug=trueredacts sensitive data (user text content, full filesystem paths). - Set
TRANSLATEGEMMA_VERBOSE_DEBUG=1to enable full diagnostics (for troubleshooting).
- If a download is interrupted, the loader auto-resumes on next run.
- If corruption persists, delete the model folder under
ComfyUI/models/LLM/TranslateGemma/and retry.
This repository is licensed under the MIT License (see LICENSE). TranslateGemma model weights are governed by Google's Gemma Terms of Use.
