Under development.
ggml-guided-diffusion is a pure ggml implementation of OpenAI guided-diffusion.
Loads the mighty 512x512_diffusion_uncond_finetune_008100.pt into ggml with CUDA support.
All helper scripts now live under tests/torch-validation/scripts.
# 1) Activate environment
source .venv/bin/activate
# 2) Download CLIP ViT-B/16 weights into ./models/clip
python3 tests/torch-validation/scripts/download_clip.py
# 3) Optional compatibility conversion (creates models/clip/model.safetensors)
python3 tests/torch-validation/scripts/convert_clip_to_safetensors.py
# 4) Generate a CLIP-guided sample
python3 tests/torch-validation/scripts/clip_guided_diffusion.py \
--prompt "a cinematic photo of a red sports car" \
--steps 200 \
--clip-scale 120 \
--num-cutouts 16 \
--max-guidance 0.20 \
--output build/clip_guided_car.pngTuning tips:
- Use
--image-sizeas a multiple of 64 (128, 256, 512). - Increase
--steps(150-300) for stronger structure. - Increase
--clip-scalegradually (120 -> 200 -> 300). - Keep
--max-guidancebetween0.15and0.30for stability. - Increase
--num-cutouts(16-32) for better CLIP stability (slower).
This path keeps the sampler and guidance injection fully in C++, while using precomputed CLIP guidance gradients exported from Python for parity testing.
# 1) Export per-step CLIP gradients from the Python reference loop
source .venv/bin/activate
python3 tests/torch-validation/scripts/clip_guided_diffusion.py \
--prompt "test prompt" \
--image-size 128 \
--steps 20 \
--clip-scale 120 \
--num-cutouts 8 \
--max-guidance 0.20 \
--output build/clip_phase1_ref.png \
--dump-gradients build/clip_phase1_grads.bin
# 2) Run the C++ sampler with the same CLIP guidance values
./build/ggml-guided-diffusion ./models \
--sampler ddim \
--sample-width 128 \
--sample-height 128 \
--sample-steps 20 \
--sample-output build/clip_phase1_cpp.ppm \
--clip-gradients build/clip_phase1_grads.bin \
--clip-guidance-scale 120 \
--clip-max-guidance 0.20 \
--clip-guidance-verbosePhase 1 scope:
- Implemented in C++: DDIM loop + CLIP guidance math injection.
- Pending for full C++ CLIP: tokenizer + CLIP transformer forward + gradient generation.
ggml supports CUDA builds. You can compile this repo with CUDA backend enabled:
# Build ggml with CUDA backend
./build_ggml.sh --cuda
# Build app against CUDA-enabled ggml
./build_app.sh --jobs 8 --no-run
# Generate
./build/ggml-guided-diffusion ./models --sampler ddim --sample-steps 150 --sample-output build/out.ppm
ffmpeg -y -i build/out.ppm build/out.png
# Optional: specify CUDA arch list (example for Ampere)
./build_ggml.sh --cuda --cuda-arch 86
./build_app.sh --cuda --cuda-arch 86
# Optional: pin CUDA compiler explicitly (avoids /usr/bin/nvcc mismatches)
./build_ggml.sh --cuda --cuda-compiler /usr/local/cuda/bin/nvcc
./build_app.sh --cuda --cuda-compiler /usr/local/cuda/bin/nvcc# C++ tests
ctest --test-dir build --output-on-failure
# Torch vs ggml parity check
# (checks tensor names, shapes, dtypes, and architecture signature)
./tests/torch-validation/scripts/run_manifest_parity.sh
# Diffusion scheduler and reverse-step math parity vs Torch
./tests/torch-validation/scripts/run_diffusion_math_parity.sh
# UNet forward parity (strict gate + stage diagnostics)
./tests/torch-validation/scripts/run_unet_forward_parity.sh
UNET_PARITY_STRICT=1 ./tests/torch-validation/scripts/run_unet_forward_parity.sh
- src/zip_archive.cpp: raw C++ ZIP central-directory parsing and payload extraction.
- src/pickle_parser.cpp: raw C++ PyTorch pickle opcode parsing into tensor descriptors.
- src/model_loader.cpp: raw ggml tensor/context allocation and model materialization.
- src/main.cpp: thin CLI entrypoint and model iteration flow.
Tests live in tests, currently with parser and candidate-discovery coverage in tests/test_pickle_parser.cpp.
Codebase for Diffusion Models Beat GANS on Image Synthesis: