GitHub - Abu-Sameer-66/ChemLLM-Tox-OLMo: Fine-tuning OLMo-7B with QLoRA & DeepChem for Molecular Toxicity Prediction on the Tox21 dataset.

🧬 Project Mission

"Teaching Language Models to 'Read' Chemistry."

ChemLLM-Adapter is a state-of-the-art Fine-Tuning Engine designed to bridge the gap between Generative AI and Computational Toxicology. By leveraging QLoRA (Quantized Low-Rank Adaptation), this system adapts the massive OLMo-7B model to predict molecular toxicity directly from chemical sequences (SMILES), democratizing drug discovery research on consumer hardware.

🔬 Core Engineering Innovations

Feature	The Tech Behind It
🧠 QLoRA Fine-Tuning	Implements 4-bit Normal Float (nf4) quantization to compress the 7B model, while attaching trainable LoRA adapters (`r=32`, `alpha=64`) for efficient learning.
🧪 Molecular Tokenization	Converts raw chemical formulas (e.g., `C(=O)O`) into structured semantic prompts: `Molecule: [SMILES] \nTask: [Assay] \nIs Toxic: ?`
⚖️ Smart Balancing	The `dataset.py` engine automatically handles the severe class imbalance in Tox21 by applying 3x Oversampling to rare toxic samples.
🛡️ DeepChem Integration	Uses RDKit and DeepChem loaders to validate molecular integrity before ingestion into the neural network.

⚙️ System Architecture

Visualizing the Data Flow from Molecule to Prediction:

graph TD
    A[🧪 Tox21 Raw Data] -->|Validation| B(dataset.py)
    B -->|Tokenize| C{Molecules}
    D[🤖 OLMo-7B Model] -->|Quantize| E(model.py)
    C & E -->|Train Loop| F[train.py]
    F --> G[🚀 Fine-Tuned Adapter]

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
model.py		model.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 Project Mission

🔬 Core Engineering Innovations

⚙️ System Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧬 Project Mission

🔬 Core Engineering Innovations

⚙️ System Architecture

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages