Skip to content
Change the repository type filter

All

    Repositories list

    • ffpa-attn

      Public
      FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA.
      Cuda
      Apache License 2.0
      1627010Updated Apr 21, 2026Apr 21, 2026
    • LeetCUDA

      Public
      📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
      Cuda
      GNU General Public License v3.0
      1.1k11k20Updated Apr 20, 2026Apr 20, 2026
    • cache-dit

      Public
      A PyTorch-native Inference Engine with Cache Acceleration, Parallelism and Quantization for DiTs.
      Python
      Apache License 2.0
      69400Updated Apr 20, 2026Apr 20, 2026
    • Awesome-LLM-Inference

      Public
      📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
      Python
      GNU General Public License v3.0
      3665.2k01Updated Apr 20, 2026Apr 20, 2026
    • diffusers

      Public
      🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
      Python
      Apache License 2.0
      6.9k100Updated Apr 17, 2026Apr 17, 2026
    • quack

      Public
      A Quirky Assortment of CuTe Kernels
      Python
      Apache License 2.0
      114200Updated Apr 17, 2026Apr 17, 2026
    • cutlass

      Public
      CUDA Templates and Python DSLs for High-Performance Linear Algebra
      C++
      Other
      1.8k100Updated Apr 13, 2026Apr 13, 2026
    • sglang

      Public
      SGLang is a fast serving framework for large language models and vision language models.
      Python
      Apache License 2.0
      5.5k100Updated Apr 2, 2026Apr 2, 2026
    • TensorRT-LLM

      Public
      TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inferen…
      Python
      Other
      2.3k100Updated Apr 1, 2026Apr 1, 2026
    • nunchaku

      Public
      [ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
      Python
      Apache License 2.0
      243300Updated Mar 31, 2026Mar 31, 2026
    • 📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
      Python
      GNU General Public License v3.0
      2653800Updated Mar 19, 2026Mar 19, 2026
    • 🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
      C++
      GNU General Public License v3.0
      7754.4k10Updated Mar 19, 2026Mar 19, 2026
    • vllm-omni

      Public
      A framework for efficient model inference with omni-modality models
      Python
      Apache License 2.0
      805100Updated Mar 12, 2026Mar 12, 2026
    • Distributed Compiler based on Triton for Parallel Systems
      Python
      MIT License
      138000Updated Mar 11, 2026Mar 11, 2026
    • ao

      Public
      PyTorch native quantization and sparsity for training and inference
      Python
      Other
      493100Updated Mar 10, 2026Mar 10, 2026
    • ComfyUI-CacheDiT

      Public
      Cache-DiT Node for Comfyui
      Python
      Apache License 2.0
      14100Updated Feb 3, 2026Feb 3, 2026
    • Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics a…
      Cuda
      Apache License 2.0
      399000Updated Jan 22, 2026Jan 22, 2026
    • flux-fast

      Public
      A forked version of flux-fast that makes flux-fast even faster with cache-dit.
      Python
      17400Updated Jan 5, 2026Jan 5, 2026
    • Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
      Python
      Apache License 2.0
      484100Updated Jan 1, 2026Jan 1, 2026
    • Z-Image

      Public
      Python
      Apache License 2.0
      747100Updated Dec 25, 2025Dec 25, 2025
    • PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.
      Python
      30000Updated Dec 24, 2025Dec 24, 2025
    • NVIDIA cuTile learn
      Python
      2000Updated Dec 9, 2025Dec 9, 2025
    • .github

      Public
      0100Updated Nov 25, 2025Nov 25, 2025
    • [NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
      Python
      Apache License 2.0
      91000Updated Oct 30, 2025Oct 30, 2025
    • 🔥LongCat-Video 1.7x🎉 speedup: cache acceleration and 4/8-bits weight only.
      Python
      0810Updated Oct 28, 2025Oct 28, 2025
    • Python
      MIT License
      342000Updated Oct 28, 2025Oct 28, 2025
    • ComfyUI

      Public
      The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
      Python
      GNU General Public License v3.0
      13k000Updated Oct 27, 2025Oct 27, 2025
    • ⚡️Qwen-Image 4.8x🎉 speedup with Hybrid Acceleration for low VRAM GPUs
      Python
      Apache License 2.0
      01740Updated Oct 24, 2025Oct 24, 2025
    • Kandinsky 5.0: A family of diffusion models for Video & Image generation
      Python
      Apache License 2.0
      57000Updated Oct 22, 2025Oct 22, 2025
    • Wan2.1

      Public
      Wan: Open and Advanced Large-Scale Video Generative Models
      Python
      Apache License 2.0
      2.6k000Updated Oct 17, 2025Oct 17, 2025
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.