Skip to content

Robin-WZQ/Awesome-Backdoor-on-LMMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

123 Commits
 
 
 
 
 
 

Repository files navigation

🤗 Awesome-Backdoor-on-LMMs 🤗

Curated list of backdoor attacks and defenses on Large Multimodal Models (LMMs), aligned with our work:
Backdoor Attacks and Defenses on Large Multimodal Models: A Survey

arXiv Badge Awesome Badge visitors License Badge GitHub stars

Any additional things regarding backdoor, PRs, issues are welcome. Any problems, please contact wangzhongqi23s@ict.ac.cn. If you find this repository useful to your research or work, it is really appreciated to star this repository and cite our papers here. ✨

📜 Table of Contents

👑 Awesome Papers

Vision Language Pretrained Models

Backdoor Attack

Time Title Venue Paper Code
2021.06 POISONING AND BACKDOORING CONTRASTIVE LEARNING arXiv link -
2021.07 BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning SSP'22 link code
2022.09 Data Poisoning Attacks Against Multimodal Encoders ICML'23 link code
2023.10 GhostEncoder: Stealthy backdoor attacks with dynamic triggers to pre-trained encoders in self-supervised learning CS'24 link -
2023.11 BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning CVPR'24 link code
2024.05 Distribution Preserving Backdoor Attack in Self-supervised Learning SSP'24 link -
2024.08 BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning MICCAI'24 link code
2025.03 MP-Nav: Enhancing Data Poisoning Attacks against Multimodal Learning ICML'25 link -
2025.03 Backdooring CLIP through Concept Confusion arXiv link -
2025.10 Invisible Backdoor Attack against Self-supervised Learning CVPR'25 link code
2025.11 Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing CIKM'25 link code
2025.11 ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training NeurIPS'25 link code
2026.01 Backdoor Attacks on Multi-modal Contrastive Learning arXiv link -
2026.01 Stealthy Backdoor Carriers: The Threat of Visual Prompts to CLIP IOTJ link code
2026.02 BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning arXiv link -
2026.03 Dormant Backdoor: Weaponizing Model Finetuning for Feasible Backdoor Attacks against Pretrained Models AAAI'26 link code
2026.04 TEALTHY AND ADJUSTABLE TEXT-GUIDED BACKDOOR ATTACKS ON MULTIMODAL PRETRAINED MODELS arXiv link code

Backdoor Defense

Time Title Venue Paper Code
2023.02 ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning Paradigms USENIX'23 link code
2023.03 CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning ICCV'23 link code
2023.03 Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks NeurIPS'23 link -
2023.03 Detecting Backdoors in Pre-trained Encoders CVPR'23 link code
2023.10 Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks ICML'24 link code
2024.03 Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning CVPRW'24 link -
2024.09 Adversarial Backdoor Defense in CLIP arXiv link -
2024.09 CleanerCLIP: Fine-grained Counterfactual Semantic Augmentation for Backdoor arXiv link -
2024.11 Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment CVPR'24 link code
2024.11 DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders CVPR'25 link code
2024.12 Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning arXiv link -
2024.12 DETECTING BACKDOOR SAMPLES IN CONTRASTIVE LANGUAGE IMAGE PRETRAINING ICLR'25 link code
2024.12 Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP arXiv link code
2025.02 A Closer Look at Backdoor Attacks on CLIP ICML'25 link -
2025.02 Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in CLIP arXiv link -
2025.12 Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models arXiv link code
2026.01 Robust defense strategies for multimodal contrastive learning: efficient fine-tuning against backdoor attacks Multimedia Tools and Applications link -
2026.02 InverTune: A Backdoor Defense Method for Multimodal Contrastive Learning via Backdoor-Adversarial Correlation Analysis NDSS'26 link -
2026.03 DIFT: Protecting Contrastive Learning Against Data Poisoning Backdoor Attacks AAAI'26 link -
2026.03 BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder arXiv link code
2026.04 CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion arXiv link -

Text Conditioned Diffusion Models

Backdoor Attack

Time Title Venue Paper Code
2022.11 Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis ICCV'23 link code
2023.05 Personalization as a Shortcut for Few-Shot Backdoor Attack against Text-to-Image Diffusion Models AAAI'24 link code
2023.05 Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning ACM MM'23 link code
2023.06 VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models NeurIPS'23 link code
2023.07 BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models TIFS'24 link code
2023.08 Backdooring Textual Inversion for Concept Censorship arXiv link code
2023.10 Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models S&P'24 link code
2024.01 The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline ICML'24 link code
2024.06 Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors arXiv link -
2024.07 Control ControlNet: Multidimensional Backdoor Attack Based on ControlNet ICONIP'24 link code
2024.10 EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second ACM MM'24 link code
2024.11 Combinational Backdoor Attack against Customized Text-to-Image Models arXiv link -
2024.11 TrojanEdit: Backdooring Text-Based Image Editing Models arXiv link -
2025.02 Imperceptible Backdoor Attacks on Text-Guided 3D Scene Grounding TMM'25 link -
2025.03 Towards Invisible Backdoor Attack on Text-to-Image Diffusion Model arXiv link code
2025.03 Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models CVPR'25 link code
2025.04 BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation ICCV'25 link code
2025.04 Erased but Not Forgotten: How Backdoors Compromise Concept Erasure arXiv link code
2025.04 REDEditing: Relationship-Driven Precise Backdoor Poisoning on Text-to-Image Diffusion Models arXiv link -
2025.06 TWIST: Text-encoder Weight-editing for Inserting Secret Trojans in Text-to-Image Models ACL'25 link -
2025.08 Practical, Generalizable and Robust Backdoor Attacks on Text-to-Image Diffusion Models arXiv link -
2025.08 BadBlocks: Low-Cost and Stealthy Backdoor Attacks Tailored for Text-to-Image Diffusion Models arXiv link -
2026.01 Key-Value Mapping-Based Text-to-Image Diffusion Model Backdoor Attacks Algorithms link code
2026.02 Bad-PoseDiff: Pose-Guided Backdoor Triggering in Diffusion Models TrustCom'25 link -
2026.02 Semantic-level Backdoor Attack against Text-to-Image Diffusion Models arXiv'26 link -
2026.02 When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks arXiv'26 link -
2026.02 When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters CVPR'26 link code
2026.03 Tuning Just Enough: Lightweight Backdoor Attacks on Multi-Encoder Diffusion Models ICLRW'26 link -

Backdoor Defense

Time Title Venue Paper Code
2024.04 UFID: A Unified Framework for Black-box Input-level Backdoor Detection on Diffusion Models AAAI'25 link code
2024.07 T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models ECCV'24 link code
2024.08 Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks ECCVW'24 link code
2024.11 Fine-grained Prompt Screening: Defending Against Backdoor Attack on Text-to-Image Diffusion Models IJCAI'25 link -
2025.01 Backdoor Defense for Text Encoders in Text-to-Image Generative Models IEEE TDSC'25 link code
2025.02 BackdoorDM: A Comprehensive Benchmark for Backdoor Learning in Diffusion Model NeurIPS'25 link code
2025.03 Efficient Input-level Backdoor Detection on Text-to-Image Synthesis via Neuron Activation Variation arXiv link -
2025.04 Backdoor Defense in Diffusion Models via Spatial Attention Unlearning arXiv link -
2025.04 Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models TPAMI'25 link code
2026.01 On the Fairness, Diversity and Reliability of Text-to-Image Generative Models Artificial Intelligence Review link code
2026.02 Backdoor Sentinel: Detecting and Detoxifying Backdoors in Diffusion Models via Temporal Noise Consistency arXiv link -
2026.03 BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation CVPR'26 link code
2026.03 A Dual-Purpose Framework for Backdoor Defense and Backdoor Amplification in Diffusion Models TIFS'26 link -
2026.04 Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling arXiv link -

Large Vision Language Models

Backdoor Attack

Time Title Venue Paper Code
2024.02 Shadowcast: Stealthy Data Poisoning Attacks against Vision-Language Models NeurIPS'24 link code
2024.02 VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models IJCV'25 link code
2024.02 Test-Time BACKDOOR ATTACKS ON MULTIMODAL LARGE LANGUAGE MODELS arXiv link code
2024.03 ImgTrojan: Jailbreaking Vision-Language Models with ONE Image NAACL'25 link code
2024.03 TrojVLM: Backdoor Attack Against Vision Language Models ECCV'24 link -
2024.04 PHYSICAL BACKDOOR ATTACK CAN JEOPARDIZE DRIVING WITH VISION-LARGE-LANGUAGE MODELS ICMLW'24 link -
2024.06 Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift CVPR'25 link code
2024.10 BACKDOORING VISION-LANGUAGE MODELS WITH OUT-OF-DISTRIBUTION DATA ICLR'25 link -
2025.02 Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models CVPR'25 link code
2025.03 BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models CVPR'25 link -
2025.05 Natural Reflection Backdoor Attack on Vision Language Model for Autonomous Driving arXiv link -
2025.06 Backdoor Attack on Vision Language Models with Stealthy Semantic Manipulation arXiv link
2025.07 Shadow-Activated Backdoor Attacks on Multimodal Large Language Models ACL'25 link code
2025.08 IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding arXiv link -
2025.09 TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models arXiv link code
2025.11 MTAttack: Multi-Target Backdoor Attacks against Large Vision-LanguageModels arXiv link code
2025.11 BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models arXiv link code
2026.04 HIDDEN ADS: Behavior-Triggered Semantic Backdoors for Advertisement Injection in Vision–Language Models arXiv link -
2026.04 Multimodal Backdoor Attack on VLMs for Autonomous Driving via Graffiti and Cross-Lingual Triggers arXiv link -
2026.04 Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction arXiv link -
2026.04 Phantasia: Context-Adaptive Backdoors in Vision Language Models arXiv link code

Backdoor Defense

Time Title Venue Paper Code
2025.05 Backdoor Cleaning without External Guidance in MLLM Fine-tuning NeurIPS'25 link code
2025.06 ROBUST ANTI-BACKDOOR INSTRUCTION TUNING IN LVLMS arXiv link -
2025.06 SRD: Reinforcement-Learned Semantic Perturbation AAAI'26 link code
2026.01 From Internal Diagnosis to External Auditing: A VLM-Driven Paradigm for Online Test-Time Backdoor Defense arXiv link -
2026.01 TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning arXiv link code
2026.03 Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model AAAI'26 link -
2026.03 PurMM: Attention-Guided Test-Time Backdoor Purification in Multimodal Large Language Models AAAI'26 link -
2026.03 Test-Time Attention Purification for Backdoored Large Vision Language Models arXiv link -
2026.03 Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models arXiv link code
2026.04 A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models arXiv link -
2026.04 Meta-Research on Backdoors: Dataset and Threat Model Shifts in Multimodal Backdoor Attacks arXiv link -

VLM-based Embodied AI

Backdoor Attack

  • VLA
Time Title Venue Paper Code
2025.05 BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization arXiv link code
2025.10 TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models arXiv link code
2025.11 AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models arXiv link -
2026.01 State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space arXiv link -
2026.02 Inject Once Survive Later: Backdooring Vision-Language-Action Models to Persist Through Downstream Fine-tuning arXiv link code
  • GUI
Time Title Venue Paper Code
2025.05 Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents EMNLP’25 link code
2025.06 Poison Once, Control Anywhere: Clean-Text Visual Backdoors in VLM-based Mobile Agents arXiv link -
2025.07 VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation COLM‘25 link code
2025.09 Realistic Environmental Injection Attacks on GUI Agents arXiv link code
2026.03 SlowBA: An efficiency backdoor attack towards VLM-based GUI agents arXiv link code

Backdoor Defense

Time Title Venue Paper Code
2026.02 When Attention Betrays: Erasing Backdoor Attacks in Robotic Policies by Reconstructing Visual Tokens ICRA'26 link -

Others

Time Title Venue Paper Code
2026.03 Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models arXiv link code

Other Related Awesome Repository

🥳 Reference

If you find this repository helpful for your research, we would greatly appreciate it if you could cite our papers. ✨

@article{Wang_2025,
title={Backdoor Attacks and Defenses on Large Multimodal Models: A Survey},
author={Wang, Zhongqi and Zhang, Jie and Bao, Kexin and Liang, Yifei and Shan, Shiguang and Chen, Xilin},
year={2025},
month=dec }

@article{wang2025amdet,
  title={Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models}, 
  author={Zhongqi Wang and Jie Zhang and Shiguang Shan and Xilin Chen},
  journal={arXiv preprint arXiv:2512.00343},
  year={2025},
}

@article{wang2025dynamicattentionanalysisbackdoor,
  title={Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models}, 
  author={Zhongqi Wang and Jie Zhang and Shiguang Shan and Xilin Chen},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  year={2025},
}

@article{zhang2025twt,
  title={Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models}, 
  author={Jie Zhang and Zhongqi Wang and Shiguang Shan and Xilin Chen},
  journal={arXiv preprint arXiv:2503.17724},
  year={2025},
}

@InProceedings{10.1007/978-3-031-73013-9_7,
  author="Wang, Zhongqi
  and Zhang, Jie
  and Shan, Shiguang
  and Chen, Xilin",
  title="T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models",
  booktitle="Computer Vision -- ECCV 2024",
  year="2025",
  publisher="Springer Nature Switzerland",
  address="Cham",
  pages="107--124",
  isbn="978-3-031-73013-9"
}

About

Awesome-Backdoor-on-LMMs is a collection of state-of-the-art, novel, exciting backdoor methods on LMMs (VLPs, TDMs, VLMs, and Agents).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors