- Conducted research on cutting-edge defense strategies for backdoor attacks on large language models, with practical implications for improving AI security in real-world applications
- Wrote a research paper on findings through collaboration with Royal Holloway, Mitigating Backdoor Attacks in LLMs.pdf
- Undertook preliminary work for future research and development
- 2 Python programmes (variable_obfuscator.py and formatted_evaluator) used for detecting and preventing backdoor attacks in large language models (LLMs).
- Setup Instructions
- How To Use Our Programs
- Our VM Settings
- Ensure you have a GPU that can use CUDA.
- If you do not wish to run a GCP VM, skip to step 2
- If you have a VM already setup, skip to step 3
1.1 Open GCP VM, with help of this video (up until you have the terminal open): https://www.youtube.com/watch?v=O2OZFH6RT38&t=784s We use a GCP virtual machine with the following settings:
- Machine Type: Intel Haswell n1-standard-16
- GPU: NVIDIA Tesla P100
- Image: ubuntu-pro-1804-bionic-v20230711
- A list of package versions in our conda environment can be found in “environment.txt” within the CodeT5 folder.
1.2 Once opened a new VM, run the following commands:
-
sudo apt-get update -y
-
sudo apt-get install python3-pip -y
-
pip3 install setuptools
-
pip3 install jupyterlab
1.3 The following command will be used everytime to open the VM in a jupyterlab session:
-
.local/bin/jupyter-lab --no-browser
1.4 Next open local browser and visit the following page:
1.5 Use token on the link provided after executing the command as the password:
- Example: http://localhost:8888/lab?token=4010480c6718f38001453f91e6c78ec10ff18f866520b091
- "4010480c6718f38001453f91e6c78ec10ff18f866520b091" is your password
- You should only have to do this once
2.1 Install Git:
-
sudo apt install git
2.1 Download and Install Conda (or any other environemnt manager):
- Visit this website to download and install conda: https://docs.conda.io/projects/conda/en/stable/user-guide/install/linux.html
2.2 Create a New Conda environment:
-
conda create --name [environment_name] python=3.7.16
2.3 Activate Conda environment:
-
conda activate [environment_name]
2.4 Install Docker Engine:
- Follow step 1 on this website:
- https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-18-04
2.5 Install NVIDIA Drivers:
-
apt-search nvidia-driver
-
sudo apt install nvidia-driver-[latest_driver]
- Restart VM
2.6 Install NVIDIA Container Toolkit:
-
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) -
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
-
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit -
sudo systemctl restart docker
3.1 Clone our repository:
-
git clone https://github.com/AceMegalodon/Mitigating_Backdoor_Attacks_in_LLMs.git
3.2 Download the original respository:
- https://figshare.com/articles/dataset/ICSE-23-Replication_7z/20766577/1
- Install the .7z extractor:
-
sudo apt-get install p7zip-full
- Extract the .7z file:
-
7z x adversarial-backdoor-for-code-models.7z
3.3 From the original repository, move the following files to the cloned repository:
-
mv adversarial-backdoor-for-code-models/CodeT5 Mitigating_Backdoor_Attacks_in_LLMs/adversarial-backdoor-for-code-models
-
mv adversarial-backdoor-for-code-models/datasets Mitigating_Backdoor_Attacks_in_LLMs/adversarial-backdoor-for-code-models
3.4 Delete the original repository:
-
rm -r adversarial-backdoor-for-code-models
3.5 Change directory to the cloned repository:
-
cd Mitigating_Backdoor_Attacks_in_LLMs
3.6 From the cloned repository, move the following files to their appropriate location:
-
mv renaming_results formatted_evaluator.py poisoned_reduced_dataset.txt reduced_dataset.txt results_evaluator.py variable_obfuscator.py adversarial-backdoor-for-code-models/CodeT5/sh
3.7 Assuming you are currently in your new enviornment, install the requirements:
- Note that the requirements.txt is a superset of the requirements for this product. The contents of the file are our exact versions of various packages.
-
pip install -r requirements.txt
3.8 Dataset preperation has already taken place, so you should not need to look at the README located:
- Mitigating_Backdoor_Attacks_in_LLMs/adversarial-backdoor-for-code-models/README.md
- Instead, you should now refer to this README:
- Mitigating_Backdoor_Attacks_in_LLMs/adversarial-backdoor-for-code-models/CodeT5/README.md
- variable_obfuscator.py demonstrates the variable obfuscation process.
- Install the nltk data used for variable_obfuscator:
-
python -m nltk.downloader all
- Run the following command to run the obfuscation without fine tuning the model (this is for demonstration and will not effect fine tuning results).
-
python variable_obfuscator.py
- Examples of both matching with synonyms and complete / partial obfuscation are in varaible_obfuscator.py
- If you wish to use these processes in fine tuning and get results for the models, you can put these one of these two functions into _utils and replace their name with read_summarize_examples_adv.
- Ensure to comment out the original read_summarize_examples_adv function you do not wish to use.
- Then run the training command, for example:
-
nohup python run_exp.py \ --model_tag codebert \ --task summarize-adv-0.05 \ --sub_task python \ --gpu 0
-
This compares the results of obfuscation with the results that would have occured without obfuscation when fine tuning the model.
-
Ensure to change the file names in results_evaluator to match the files you wish to comapare.
-
Example: /home/[user_name]/Mitigating_Backdoor_Attacks_in_LLMs/adversarial-backdoor-for-code-models/CodeT5/sh/ saved_models/summarize-adv-0.05/python/codet5_small_all_lr5_bs32_src256_trg128_pat2_e15/prediction/dev_e0.output
-
Note that renaming_results is simply a collection of our results, and not the destination of the results.
-
Run the following command to evaluate the results after obfuscation when fine tuning the model:
-
python results_evaluator.py
-
formatted_evaluator.py evaluates all if and while conditions in a normalised dataset of fully functional python programs.
-
Run the following command to run evaluation on a small normalised dataset:
-
python formatted_evaluator.py
- Ubuntu Pro 18.04
- We use a GCP virtual machine with the following settings:
- Machine Type: Intel Haswell n1-standard-16 (16 vCPU, 8 core, 60GB Memory)
- GPU: NVIDIA Tesla P100
- Image: ubuntu-pro-1804-bionic-v20230711
| Name | Version | Build | Channel |
|---|---|---|---|
| _libgcc_mutex | 0.1 | main | |
| _openmp_mutex | 5.1 | 1_gnu | |
| absl-py | 2.0.0 | pypi_0 | pypi |
| ca-certificates | 2023.08.22 | h06a4308_0 | |
| cachetools | 4.2.4 | pypi_0 | pypi |
| certifi | 2022.12.7 | py37h06a4308_0 | |
| charset-normalizer | 3.3.0 | pypi_0 | pypi |
| click | 8.1.7 | pypi_0 | pypi |
| filelock | 3.12.2 | pypi_0 | pypi |
| google-auth | 1.35.0 | pypi_0 | pypi |
| google-auth-oauthlib | 0.4.6 | pypi_0 | pypi |
| grpcio | 1.59.0 | pypi_0 | pypi |
| huggingface-hub | 0.0.8 | pypi_0 | pypi |
| idna | 3.4 | pypi_0 | pypi |
| importlib-metadata | 6.7.0 | pypi_0 | pypi |
| joblib | 1.3.2 | pypi_0 | pypi |
| ld_impl_linux-64 | 2.38 | h1181459_1 | |
| libffi | 3.4.4 | h6a678d5_0 | |
| libgcc-ng | 11.2.0 | h1234567_1 | |
| libgomp | 11.2.0 | h1234567_1 | |
| libstdcxx-ng | 11.2.0 | h1234567_1 | |
| markdown | 3.4.4 | pypi_0 | pypi |
| markupsafe | 2.1.3 | pypi_0 | pypi |
| ncurses | 6.4 | h6a678d5_0 | |
| nltk | 3.8.1 | pypi_0 | pypi |
| numpy | 1.19.5 | pypi_0 | pypi |
| oauthlib | 3.2.2 | pypi_0 | pypi |
| openssl | 1.1.1w | h7f8727e_0 | |
| packaging | 23.2 | pypi_0 | pypi |
| pip | 22.3.1 | py37h06a4308_0 | |
| protobuf | 3.20.0 | pypi_0 | pypi |
| pyasn1 | 0.5.0 | pypi_0 | pypi |
| pyasn1-modules | 0.3.0 | pypi_0 | pypi |
| python | 3.7.16 | h7a1cb2a_0 | |
| readline | 8.2 | h5eee18b_0 | |
| regex | 2023.10.3 | pypi_0 | pypi |
| requests | 2.31.0 | pypi_0 | pypi |
| requests-oauthlib | 1.3.1 | pypi_0 | pypi |
| rsa | 4.9 | pypi_0 | pypi |
| sacremoses | 0.0.53 | pypi_0 | pypi |
| setuptools | 65.6.3 | py37h06a4308_0 | |
| six | 1.16.0 | pypi_0 | pypi |
| sqlite | 3.41.2 | h5eee18b_0 | |
| tensorboard | 2.4.1 | pypi_0 | pypi |
| tensorboard-plugin-wit | 1.8.1 | pypi_0 | pypi |
| tk | 8.6.12 | h1ccaba5_0 | |
| tokenizers | 0.10.3 | pypi_0 | pypi |
| torch | 1.7.1 | pypi_0 | pypi |
| tqdm | 4.66.1 | pypi_0 | pypi |
| transformers | 4.6.1 | pypi_0 | pypi |
| tree-sitter | 0.2.2 | pypi_0 | pypi |
| typing-extensions | 4.7.1 | pypi_0 | pypi |
| urllib3 | 2.0.7 | pypi_0 | pypi |
| werkzeug | 2.2.3 | pypi_0 | pypi |
| wheel | 0.38.4 | py37h06a4308_0 | |
| xz | 5.4.2 | h5eee18b_0 | |
| zipp | 3.15.0 | pypi_0 | pypi |
| zlib | 1.2.13 | h5eee18b_0 |