|
| 1 | +# CheckpointLoaderSimpleDisTorch2MultiGPU |
| 2 | + |
| 3 | +The `CheckpointLoaderSimpleDisTorch2MultiGPU` node is used to load checkpoint models (complete diffusion models containing UNet, CLIP, and VAE components) with DisTorch2 distributed tensor allocation, enabling advanced multi-device VRAM management to handle larger models across multiple GPUs. |
| 4 | + |
| 5 | +This node automatically detects models located in the `ComfyUI/models/checkpoints` folder, and it will also read models from additional paths configured in the `extra_model_paths.yaml` file. Sometimes, you may need to **refresh the ComfyUI interface** to allow it to read the model files from the corresponding folder. |
| 6 | + |
| 7 | +## Inputs |
| 8 | + |
| 9 | +| Parameter | Data Type | Description | |
| 10 | +| --- | --- | --- | |
| 11 | +| `ckpt_name` | `STRING` | The name of the checkpoint model to load. | |
| 12 | +| `compute_device` | `STRING` | Target device for compute operations (e.g., 'cuda:0', 'cuda:1', 'cpu'). Selected from available devices on your system. | |
| 13 | +| `virtual_vram_gb` | `FLOAT` | Amount of virtual VRAM in gigabytes to allocate for distributed tensor management (default: 4.0, range: 0.0-128.0). | |
| 14 | +| `donor_device` | `STRING` | Device to donate VRAM from when allocating virtual memory (default: 'cpu'). | |
| 15 | +| `expert_mode_allocations` | `STRING` | Advanced allocation string for expert users to manually specify device/ratio distributions (e.g., 'cuda:0,50%;cpu,*'). | |
| 16 | +| `keep_loaded` | `BOOLEAN` | Whether to keep the model loaded when triggering memory cleanup operations (default: true). | |
| 17 | + |
| 18 | +## Outputs |
| 19 | + |
| 20 | +| Output Name | Data Type | Description | |
| 21 | +| --- | --- | --- | |
| 22 | +| `MODEL` | `MODEL` | The loaded UNet diffusion model with DisTorch2 distributed allocation applied. | |
| 23 | +| `CLIP` | `CLIP` | The loaded CLIP text encoder model. | |
| 24 | +| `VAE` | `VAE` | The loaded VAE decoder/encoder model. | |
| 25 | + |
| 26 | +## DisTorch2 Distributed Loading |
| 27 | + |
| 28 | +DisTorch2 is an advanced memory management system that enables loading and running large diffusion models across multiple GPUs by intelligently distributing tensor allocations. Instead of loading an entire model on a single device, DisTorch2 splits the model's layers across available devices while maintaining computational efficiency. |
| 29 | + |
| 30 | +### Key Concepts |
| 31 | + |
| 32 | +**Virtual VRAM Allocation**: Artificially increases the available VRAM on the compute device by borrowing memory capacity from donor devices through intelligent tensor distribution. |
| 33 | + |
| 34 | +**Expert Mode Allocations**: Advanced users can manually specify exactly how much of the model should be placed on each device using ratio or byte-based allocation strings. |
| 35 | + |
| 36 | +### Allocation Examples |
| 37 | + |
| 38 | +**Basic Virtual VRAM Mode**: |
| 39 | +- `compute_device`: `cuda:0` |
| 40 | +- `virtual_vram_gb`: `8.0` |
| 41 | +- `donor_device`: `cuda:1` |
| 42 | +- Result: Loads model as if cuda:0 had 8GB more VRAM available, using cuda:1 as memory donor. |
| 43 | + |
| 44 | +**Expert Ratio Allocation**: |
| 45 | +- `expert_mode_allocations`: `cuda:0,60%;cuda:1,30%;cpu,10%` |
| 46 | +- Distributes model layers with 60% on GPU 0, 30% on GPU 1, and 10% on CPU. |
| 47 | + |
| 48 | +**Expert Byte Allocation**: |
| 49 | +- `expert_mode_allocations`: `cuda:0,4gb;cuda:1,2gb;cpu,*` |
| 50 | +- Allocates exactly 4GB to cuda:0, 2GB to cuda:1, and remaining to CPU. |
| 51 | + |
| 52 | +**Mixed Mode**: |
| 53 | +Combines virtual VRAM with expert allocations for complex multi-device scenarios. |
0 commit comments