refactor: keep_loaded --> eject_models Boolean switch. Use it to eject all other models prior to loading model for inference; helpful to maximize available latent space on device prior to UNet inference, for example

pollockjj · pollockjj · commit f60aa6a9a72e · 2025-10-04T17:42:52.000-05:00
So this is a change from something just newly-released in 2.5.0, but most should either see an improvement or no change to behavior. This was the weakest, and jankiest part of 2.5.0 and my decision to manage a CPU memory leak turned into a too-aggressive solution with unwanted side effects.

This solution should provide a better way to manage `compute` VRAM as the most asked-for feature is a way to remove everything else from VRAM prior to main UNet inference, which this accomplishes nicely, as well as reporting back accurate information DisTorch2 on-device shard sizes.
diff --git a/__init__.py b/__init__.py
@@ -21,7 +21,7 @@
 )
 
 WEB_DIRECTORY = "./web"
-MGPU_MM_LOG = True 
+MGPU_MM_LOG = False
 DEBUG_LOG = False
 
 logger = logging.getLogger("MultiGPU")
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,7 +1,7 @@
 [project]
 name = "comfyui-multigpu"
 description = "Provides a suite of custom nodes to manage multiple GPUs for ComfyUI, including advanced model offloading for both GGUF and Safetensor formats with DisTorch, and bespoke MultiGPU support for WanVideoWrapper and other custom nodes."
-version = "2.5.0"
+version = "2.5.1"
 license = {file = "LICENSE"}
 
 [project.urls]

Original file line number	Diff line number	Diff line change
`@@ -21,7 +21,7 @@`
`21`	`21`	`)`
`22`	`22`
`23`	`23`	`WEB_DIRECTORY = "./web"`
`24`		`-MGPU_MM_LOG = True`
	`24`	`+MGPU_MM_LOG = False`
`25`	`25`	`DEBUG_LOG = False`
`26`	`26`
`27`	`27`	`logger = logging.getLogger("MultiGPU")`