Curated list of backdoor attacks and defenses on Large Multimodal Models (LMMs), aligned with our work:
Backdoor Attacks and Defenses on Large
Multimodal Models: A Survey
Any additional things regarding backdoor, PRs, issues are welcome. Any problems, please contact wangzhongqi23s@ict.ac.cn. If you find this repository useful to your research or work, it is really appreciated to star this repository and cite our papers here. ✨
- Vision Language Pretrained Models (VLPs)
- Text Conditioned Diffusion Models (TDMs)
- Large Vision Language Models (LVLMs)
- VLM-based Embodied AI
| Time | Title | Venue | Paper | Code |
|---|---|---|---|---|
| 2021.06 | POISONING AND BACKDOORING CONTRASTIVE LEARNING | arXiv | link | - |
| 2021.07 | BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning | SSP'22 | link | code |
| 2022.09 | Data Poisoning Attacks Against Multimodal Encoders | ICML'23 | link | code |
| 2023.10 | GhostEncoder: Stealthy backdoor attacks with dynamic triggers to pre-trained encoders in self-supervised learning | CS'24 | link | - |
| 2023.11 | BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning | CVPR'24 | link | code |
| 2024.05 | Distribution Preserving Backdoor Attack in Self-supervised Learning | SSP'24 | link | - |
| 2024.08 | BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning | MICCAI'24 | link | code |
| 2025.03 | MP-Nav: Enhancing Data Poisoning Attacks against Multimodal Learning | ICML'25 | link | - |
| 2025.03 | Backdooring CLIP through Concept Confusion | arXiv | link | - |
| 2025.10 | Invisible Backdoor Attack against Self-supervised Learning | CVPR'25 | link | code |
| 2025.11 | Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing | CIKM'25 | link | code |
| 2025.11 | ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training | NeurIPS'25 | link | code |
| 2026.01 | Backdoor Attacks on Multi-modal Contrastive Learning | arXiv | link | - |
| 2026.01 | Stealthy Backdoor Carriers: The Threat of Visual Prompts to CLIP | IOTJ | link | code |
| 2026.02 | BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning | arXiv | link | - |
| 2026.03 | Dormant Backdoor: Weaponizing Model Finetuning for Feasible Backdoor Attacks against Pretrained Models | AAAI'26 | link | code |
| 2026.04 | TEALTHY AND ADJUSTABLE TEXT-GUIDED BACKDOOR ATTACKS ON MULTIMODAL PRETRAINED MODELS | arXiv | link | code |
| Time | Title | Venue | Paper | Code |
|---|---|---|---|---|
| 2023.02 | ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning Paradigms | USENIX'23 | link | code |
| 2023.03 | CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning | ICCV'23 | link | code |
| 2023.03 | Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks | NeurIPS'23 | link | - |
| 2023.03 | Detecting Backdoors in Pre-trained Encoders | CVPR'23 | link | code |
| 2023.10 | Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks | ICML'24 | link | code |
| 2024.03 | Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning | CVPRW'24 | link | - |
| 2024.09 | Adversarial Backdoor Defense in CLIP | arXiv | link | - |
| 2024.09 | CleanerCLIP: Fine-grained Counterfactual Semantic Augmentation for Backdoor | arXiv | link | - |
| 2024.11 | Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment | CVPR'24 | link | code |
| 2024.11 | DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders | CVPR'25 | link | code |
| 2024.12 | Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning | arXiv | link | - |
| 2024.12 | DETECTING BACKDOOR SAMPLES IN CONTRASTIVE LANGUAGE IMAGE PRETRAINING | ICLR'25 | link | code |
| 2024.12 | Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP | arXiv | link | code |
| 2025.02 | A Closer Look at Backdoor Attacks on CLIP | ICML'25 | link | - |
| 2025.02 | Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in CLIP | arXiv | link | - |
| 2025.12 | Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models | arXiv | link | code |
| 2026.01 | Robust defense strategies for multimodal contrastive learning: efficient fine-tuning against backdoor attacks | Multimedia Tools and Applications | link | - |
| 2026.02 | InverTune: A Backdoor Defense Method for Multimodal Contrastive Learning via Backdoor-Adversarial Correlation Analysis | NDSS'26 | link | - |
| 2026.03 | DIFT: Protecting Contrastive Learning Against Data Poisoning Backdoor Attacks | AAAI'26 | link | - |
| 2026.03 | BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder | arXiv | link | code |
| 2026.04 | CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion | arXiv | link | - |
| Time | Title | Venue | Paper | Code |
|---|---|---|---|---|
| 2022.11 | Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis | ICCV'23 | link | code |
| 2023.05 | Personalization as a Shortcut for Few-Shot Backdoor Attack against Text-to-Image Diffusion Models | AAAI'24 | link | code |
| 2023.05 | Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning | ACM MM'23 | link | code |
| 2023.06 | VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models | NeurIPS'23 | link | code |
| 2023.07 | BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models | TIFS'24 | link | code |
| 2023.08 | Backdooring Textual Inversion for Concept Censorship | arXiv | link | code |
| 2023.10 | Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models | S&P'24 | link | code |
| 2024.01 | The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline | ICML'24 | link | code |
| 2024.06 | Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors | arXiv | link | - |
| 2024.07 | Control ControlNet: Multidimensional Backdoor Attack Based on ControlNet | ICONIP'24 | link | code |
| 2024.10 | EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second | ACM MM'24 | link | code |
| 2024.11 | Combinational Backdoor Attack against Customized Text-to-Image Models | arXiv | link | - |
| 2024.11 | TrojanEdit: Backdooring Text-Based Image Editing Models | arXiv | link | - |
| 2025.02 | Imperceptible Backdoor Attacks on Text-Guided 3D Scene Grounding | TMM'25 | link | - |
| 2025.03 | Towards Invisible Backdoor Attack on Text-to-Image Diffusion Model | arXiv | link | code |
| 2025.03 | Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models | CVPR'25 | link | code |
| 2025.04 | BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation | ICCV'25 | link | code |
| 2025.04 | Erased but Not Forgotten: How Backdoors Compromise Concept Erasure | arXiv | link | code |
| 2025.04 | REDEditing: Relationship-Driven Precise Backdoor Poisoning on Text-to-Image Diffusion Models | arXiv | link | - |
| 2025.06 | TWIST: Text-encoder Weight-editing for Inserting Secret Trojans in Text-to-Image Models | ACL'25 | link | - |
| 2025.08 | Practical, Generalizable and Robust Backdoor Attacks on Text-to-Image Diffusion Models | arXiv | link | - |
| 2025.08 | BadBlocks: Low-Cost and Stealthy Backdoor Attacks Tailored for Text-to-Image Diffusion Models | arXiv | link | - |
| 2026.01 | Key-Value Mapping-Based Text-to-Image Diffusion Model Backdoor Attacks | Algorithms | link | code |
| 2026.02 | Bad-PoseDiff: Pose-Guided Backdoor Triggering in Diffusion Models | TrustCom'25 | link | - |
| 2026.02 | Semantic-level Backdoor Attack against Text-to-Image Diffusion Models | arXiv'26 | link | - |
| 2026.02 | When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks | arXiv'26 | link | - |
| 2026.02 | When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters | CVPR'26 | link | code |
| 2026.03 | Tuning Just Enough: Lightweight Backdoor Attacks on Multi-Encoder Diffusion Models | ICLRW'26 | link | - |
| Time | Title | Venue | Paper | Code |
|---|---|---|---|---|
| 2024.04 | UFID: A Unified Framework for Black-box Input-level Backdoor Detection on Diffusion Models | AAAI'25 | link | code |
| 2024.07 | T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models | ECCV'24 | link | code |
| 2024.08 | Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks | ECCVW'24 | link | code |
| 2024.11 | Fine-grained Prompt Screening: Defending Against Backdoor Attack on Text-to-Image Diffusion Models | IJCAI'25 | link | - |
| 2025.01 | Backdoor Defense for Text Encoders in Text-to-Image Generative Models | IEEE TDSC'25 | link | code |
| 2025.02 | BackdoorDM: A Comprehensive Benchmark for Backdoor Learning in Diffusion Model | NeurIPS'25 | link | code |
| 2025.03 | Efficient Input-level Backdoor Detection on Text-to-Image Synthesis via Neuron Activation Variation | arXiv | link | - |
| 2025.04 | Backdoor Defense in Diffusion Models via Spatial Attention Unlearning | arXiv | link | - |
| 2025.04 | Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models | TPAMI'25 | link | code |
| 2026.01 | On the Fairness, Diversity and Reliability of Text-to-Image Generative Models | Artificial Intelligence Review | link | code |
| 2026.02 | Backdoor Sentinel: Detecting and Detoxifying Backdoors in Diffusion Models via Temporal Noise Consistency | arXiv | link | - |
| 2026.03 | BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation | CVPR'26 | link | code |
| 2026.03 | A Dual-Purpose Framework for Backdoor Defense and Backdoor Amplification in Diffusion Models | TIFS'26 | link | - |
| 2026.04 | Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling | arXiv | link | - |
| Time | Title | Venue | Paper | Code |
|---|---|---|---|---|
| 2024.02 | Shadowcast: Stealthy Data Poisoning Attacks against Vision-Language Models | NeurIPS'24 | link | code |
| 2024.02 | VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models | IJCV'25 | link | code |
| 2024.02 | Test-Time BACKDOOR ATTACKS ON MULTIMODAL LARGE LANGUAGE MODELS | arXiv | link | code |
| 2024.03 | ImgTrojan: Jailbreaking Vision-Language Models with ONE Image | NAACL'25 | link | code |
| 2024.03 | TrojVLM: Backdoor Attack Against Vision Language Models | ECCV'24 | link | - |
| 2024.04 | PHYSICAL BACKDOOR ATTACK CAN JEOPARDIZE DRIVING WITH VISION-LARGE-LANGUAGE MODELS | ICMLW'24 | link | - |
| 2024.06 | Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift | CVPR'25 | link | code |
| 2024.10 | BACKDOORING VISION-LANGUAGE MODELS WITH OUT-OF-DISTRIBUTION DATA | ICLR'25 | link | - |
| 2025.02 | Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models | CVPR'25 | link | code |
| 2025.03 | BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models | CVPR'25 | link | - |
| 2025.05 | Natural Reflection Backdoor Attack on Vision Language Model for Autonomous Driving | arXiv | link | - |
| 2025.06 | Backdoor Attack on Vision Language Models with Stealthy Semantic Manipulation | arXiv | link | |
| 2025.07 | Shadow-Activated Backdoor Attacks on Multimodal Large Language Models | ACL'25 | link | code |
| 2025.08 | IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding | arXiv | link | - |
| 2025.09 | TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models | arXiv | link | code |
| 2025.11 | MTAttack: Multi-Target Backdoor Attacks against Large Vision-LanguageModels | arXiv | link | code |
| 2025.11 | BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models | arXiv | link | code |
| 2026.04 | HIDDEN ADS: Behavior-Triggered Semantic Backdoors for Advertisement Injection in Vision–Language Models | arXiv | link | - |
| 2026.04 | Multimodal Backdoor Attack on VLMs for Autonomous Driving via Graffiti and Cross-Lingual Triggers | arXiv | link | - |
| 2026.04 | Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction | arXiv | link | - |
| 2026.04 | Phantasia: Context-Adaptive Backdoors in Vision Language Models | arXiv | link | code |
| Time | Title | Venue | Paper | Code |
|---|---|---|---|---|
| 2025.05 | Backdoor Cleaning without External Guidance in MLLM Fine-tuning | NeurIPS'25 | link | code |
| 2025.06 | ROBUST ANTI-BACKDOOR INSTRUCTION TUNING IN LVLMS | arXiv | link | - |
| 2025.06 | SRD: Reinforcement-Learned Semantic Perturbation | AAAI'26 | link | code |
| 2026.01 | From Internal Diagnosis to External Auditing: A VLM-Driven Paradigm for Online Test-Time Backdoor Defense | arXiv | link | - |
| 2026.01 | TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning | arXiv | link | code |
| 2026.03 | Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model | AAAI'26 | link | - |
| 2026.03 | PurMM: Attention-Guided Test-Time Backdoor Purification in Multimodal Large Language Models | AAAI'26 | link | - |
| 2026.03 | Test-Time Attention Purification for Backdoored Large Vision Language Models | arXiv | link | - |
| 2026.03 | Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models | arXiv | link | code |
| 2026.04 | A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models | arXiv | link | - |
| 2026.04 | Meta-Research on Backdoors: Dataset and Threat Model Shifts in Multimodal Backdoor Attacks | arXiv | link | - |
- VLA
| Time | Title | Venue | Paper | Code |
|---|---|---|---|---|
| 2025.05 | BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization | arXiv | link | code |
| 2025.10 | TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models | arXiv | link | code |
| 2025.11 | AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models | arXiv | link | - |
| 2026.01 | State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space | arXiv | link | - |
| 2026.02 | Inject Once Survive Later: Backdooring Vision-Language-Action Models to Persist Through Downstream Fine-tuning | arXiv | link | code |
- GUI
| Time | Title | Venue | Paper | Code |
|---|---|---|---|---|
| 2025.05 | Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents | EMNLP’25 | link | code |
| 2025.06 | Poison Once, Control Anywhere: Clean-Text Visual Backdoors in VLM-based Mobile Agents | arXiv | link | - |
| 2025.07 | VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation | COLM‘25 | link | code |
| 2025.09 | Realistic Environmental Injection Attacks on GUI Agents | arXiv | link | code |
| 2026.03 | SlowBA: An efficiency backdoor attack towards VLM-based GUI agents | arXiv | link | code |
| Time | Title | Venue | Paper | Code |
|---|---|---|---|---|
| 2026.02 | When Attention Betrays: Erasing Backdoor Attacks in Robotic Policies by Reconstructing Visual Tokens | ICRA'26 | link | - |
| Time | Title | Venue | Paper | Code |
|---|---|---|---|---|
| 2026.03 | Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models | arXiv | link | code |
- Awesome Data Poisoning and Backdoor Attacks
- Awesome-Backdoor-in-Deep-Learning
- Backdoor Learning Resources
- Awesome-LVLM-Attack
- Awesome-Large-Model-Safety
If you find this repository helpful for your research, we would greatly appreciate it if you could cite our papers. ✨
@article{Wang_2025,
title={Backdoor Attacks and Defenses on Large Multimodal Models: A Survey},
author={Wang, Zhongqi and Zhang, Jie and Bao, Kexin and Liang, Yifei and Shan, Shiguang and Chen, Xilin},
year={2025},
month=dec }
@article{wang2025amdet,
title={Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models},
author={Zhongqi Wang and Jie Zhang and Shiguang Shan and Xilin Chen},
journal={arXiv preprint arXiv:2512.00343},
year={2025},
}
@article{wang2025dynamicattentionanalysisbackdoor,
title={Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models},
author={Zhongqi Wang and Jie Zhang and Shiguang Shan and Xilin Chen},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
year={2025},
}
@article{zhang2025twt,
title={Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models},
author={Jie Zhang and Zhongqi Wang and Shiguang Shan and Xilin Chen},
journal={arXiv preprint arXiv:2503.17724},
year={2025},
}
@InProceedings{10.1007/978-3-031-73013-9_7,
author="Wang, Zhongqi
and Zhang, Jie
and Shan, Shiguang
and Chen, Xilin",
title="T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models",
booktitle="Computer Vision -- ECCV 2024",
year="2025",
publisher="Springer Nature Switzerland",
address="Cham",
pages="107--124",
isbn="978-3-031-73013-9"
}
