Skip to content

Commit fd60ae6

Browse files
committed
Add MultiGPU support for various model loaders and encoders
1 parent 01b41f4 commit fd60ae6

25 files changed

Lines changed: 648 additions & 0 deletions
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# CheckpointLoaderNF4MultiGPU
2+
3+
`CheckpointLoaderNF4MultiGPU` wraps the NF4 checkpoint loader from `ComfyUI_bitsandbytes_NF4` so you can pick the execution device when working with 4-bit Quantised diffusion checkpoints.
4+
5+
## Inputs
6+
7+
All base parameters from `CheckpointLoaderNF4` are retained. The MultiGPU wrapper adds one optional field:
8+
9+
| Parameter | Data Type | Description |
10+
| --- | --- | --- |
11+
| `device` | `STRING` | Device that should own the loaded NF4 checkpoint (GPU id or `cpu`). |
12+
13+
## Outputs
14+
15+
Outputs are identical to the upstream NF4 loader (UNet/CLIP/VAE tuple). The only behavioural change is the explicit device placement. |
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# DownloadAndLoadFlorence2ModelMultiGPU
2+
3+
`DownloadAndLoadFlorence2ModelMultiGPU` mirrors the download-and-load helper supplied by `ComfyUI-Florence2`, but with explicit device and offload selection so large Florence2 checkpoints can live on secondary GPUs or CPU memory.
4+
5+
## Inputs
6+
7+
All original inputs from `DownloadAndLoadFlorence2Model` remain available. The MultiGPU wrapper introduces two optional selectors:
8+
9+
| Parameter | Data Type | Description |
10+
| --- | --- | --- |
11+
| `device` | `STRING` | Compute device to host the model once loaded. |
12+
| `offload_device` | `STRING` | Device that receives automatic offloads (defaults to `cpu`). |
13+
14+
## Outputs
15+
16+
Outputs match the base Florence2 helper (model handle plus aux data). The only difference is that the returned model is already resident on the device you specified.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# DownloadAndLoadWav2VecModelMultiGPU
2+
3+
`DownloadAndLoadWav2VecModelMultiGPU` downloads a preset Wav2Vec2 checkpoint from Hugging Face (if missing) and loads it onto the device you choose, mirroring WanVideo's helper while adding MultiGPU awareness.
4+
5+
## Inputs
6+
7+
### Required
8+
9+
| Parameter | Data Type | Description |
10+
| --- | --- | --- |
11+
| `model` | `STRING` | Preset identifier (`TencentGameMate/chinese-wav2vec2-base` or `facebook/wav2vec2-base-960h`). |
12+
| `base_precision` | `STRING` | Weight precision (`fp32`, `bf16`, `fp16`). |
13+
| `load_device` | `STRING` | Wan loader slot (`main_device` or `offload_device`). |
14+
| `device` | `STRING` | MultiGPU device to run the audio model. |
15+
16+
## Outputs
17+
18+
| Output Name | Data Type | Description |
19+
| --- | --- | --- |
20+
| `wav2vec_model` | `WAV2VECMODEL` | Downloaded and loaded Wav2Vec2 model. |
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# FantasyTalkingModelLoaderMultiGPU
2+
3+
`FantasyTalkingModelLoaderMultiGPU` loads FantasyTalking diffusion models with explicit device control, making it easier to keep speech animation workloads off your primary compute GPU.
4+
5+
## Inputs
6+
7+
### Required
8+
9+
| Parameter | Data Type | Description |
10+
| --- | --- | --- |
11+
| `model` | `STRING` | FantasyTalking model from `ComfyUI/models/diffusion_models`. |
12+
| `base_precision` | `STRING` | Precision for the weights (`fp32`, `bf16`, `fp16`). |
13+
| `device` | `STRING` | MultiGPU device that should host the model. |
14+
15+
## Outputs
16+
17+
| Output Name | Data Type | Description |
18+
| --- | --- | --- |
19+
| `model` | `FANTASYTALKINGMODEL` | Loaded FantasyTalking model bundle. |
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Florence2ModelLoaderMultiGPU
2+
3+
`Florence2ModelLoaderMultiGPU` wraps the Florence2 model loader so you can decide which device handles model inference and which device receives Wan/Comfy offloads. Use it exactly like the original node from `ComfyUI-Florence2`; all native inputs remain available.
4+
5+
## Inputs
6+
7+
All parameters from `Florence2ModelLoader` are still supported. The MultiGPU variant adds the following optional fields:
8+
9+
| Parameter | Data Type | Description |
10+
| --- | --- | --- |
11+
| `device` | `STRING` | MultiGPU device used for runtime compute (`cuda:0`, `cuda:1`, `cpu`, etc.). |
12+
| `offload_device` | `STRING` | Device that receives automatic model offloads (defaults to `cpu`). |
13+
14+
## Outputs
15+
16+
The outputs are identical to the upstream Florence2 loader (model tuple, additional metadata). Use them interchangeably in existing workflows; only the device placement behaviour changes.

web/docs/LTXVLoaderMultiGPU.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# LTXVLoaderMultiGPU
2+
3+
`LTXVLoaderMultiGPU` wraps `ComfyUI-LTXVideo`'s checkpoint loader so you can push LTX Video models to any GPU (or CPU) in your system without editing the base node.
4+
5+
## Inputs
6+
7+
Every input from the upstream `LTXVLoader` node is preserved. The MultiGPU version adds a single optional selector:
8+
9+
| Parameter | Data Type | Description |
10+
| --- | --- | --- |
11+
| `device` | `STRING` | MultiGPU device that should host the loaded LTX Video checkpoint. |
12+
13+
## Outputs
14+
15+
Outputs are identical to the original LTX Video loader. The loader simply ensures the returned model already resides on the selected device.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# LoadFluxControlNetMultiGPU
2+
3+
`LoadFluxControlNetMultiGPU` exposes device selection for XLabAI's FLUX ControlNet loader, letting you keep the ControlNet on a secondary GPU or the CPU while the main FLUX UNet stays on your primary compute device.
4+
5+
## Inputs
6+
7+
All inputs from the upstream `LoadFluxControlNet` node remain unchanged. The MultiGPU variant introduces one optional field:
8+
9+
| Parameter | Data Type | Description |
10+
| --- | --- | --- |
11+
| `device` | `STRING` | MultiGPU device that will host the ControlNet during inference. |
12+
13+
## Outputs
14+
15+
Outputs match the base FLUX ControlNet loader exactly; only the device placement differs.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# LoadWanVideoClipTextEncoderMultiGPU
2+
3+
`LoadWanVideoClipTextEncoderMultiGPU` loads WanVideo CLIP vision/text encoders on the device you specify, making it easy to keep encoders off your primary compute GPU when memory is tight.
4+
5+
## Inputs
6+
7+
### Required
8+
9+
| Parameter | Data Type | Description |
10+
| --- | --- | --- |
11+
| `model_name` | `STRING` | CLIP vision or text encoder model from `ComfyUI/models/clip_vision` or `ComfyUI/models/text_encoders`. |
12+
| `precision` | `STRING` | Weight precision for the model (`fp16`, `fp32`, or `bf16`). |
13+
14+
### Optional
15+
16+
| Parameter | Data Type | Description |
17+
| --- | --- | --- |
18+
| `device` | `STRING` | Target MultiGPU device to host the encoder. |
19+
20+
## Outputs
21+
22+
| Output Name | Data Type | Description |
23+
| --- | --- | --- |
24+
| `wan_clip_vision` | `CLIP_VISION` | Loaded CLIP vision/text module ready for image conditioning. |
25+
| `load_device` | `MULTIGPUDEVICE` | Device that now owns the encoder; feed into `WanVideoClipVisionEncode`. |
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# LoadWanVideoT5TextEncoderMultiGPU
2+
3+
`LoadWanVideoT5TextEncoderMultiGPU` loads WanVideo T5 text encoders while letting you choose the MultiGPU device used for embedding work. The node returns both the encoder handle and the device string so downstream text nodes inherit placement automatically.
4+
5+
## Inputs
6+
7+
### Required
8+
9+
| Parameter | Data Type | Description |
10+
| --- | --- | --- |
11+
| `model_name` | `STRING` | T5 model from `ComfyUI/models/text_encoders`. |
12+
| `precision` | `STRING` | Base precision for the encoder (`fp32` or `bf16`). |
13+
14+
### Optional
15+
16+
| Parameter | Data Type | Description |
17+
| --- | --- | --- |
18+
| `device` | `STRING` | MultiGPU device (defaults to secondary GPU when available). |
19+
| `quantization` | `STRING` | Enable FP8 quantisation (`fp8_e4m3fn`) when supported. |
20+
21+
## Outputs
22+
23+
| Output Name | Data Type | Description |
24+
| --- | --- | --- |
25+
| `wan_t5_model` | `WANTEXTENCODER` | Loaded Wan T5 encoder bundle. |
26+
| `load_device` | `MULTIGPUDEVICE` | Device string to reuse with `WanVideoTextEncode*` nodes. |
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# WanVideoBlockSwapMultiGPU
2+
3+
`WanVideoBlockSwapMultiGPU` prepares block swap arguments for WanVideo models and adds an explicit `swap_device` selector so you can decide which device receives swapped transformer blocks.
4+
5+
## Inputs
6+
7+
| Parameter | Data Type | Description |
8+
| --- | --- | --- |
9+
| *(base Wan block swap inputs)* | *varies* | All parameters exposed by the upstream `WanVideoBlockSwap` node are available and behave identically. |
10+
| `swap_device` | `STRING` | Additional MultiGPU device option that picks the destination for swapped layers (`cpu`, `cuda:1`, etc.). |
11+
12+
## Outputs
13+
14+
| Output Name | Data Type | Description |
15+
| --- | --- | --- |
16+
| `block_swap_args` | `BLOCKSWAPARGS` | Configuration dictionary to feed into `WanVideoModelLoaderMultiGPU` or Wan samplers. |

0 commit comments

Comments
 (0)