Skip to content

Abu-Sameer-66/ChemLLM-Tox-OLMo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


🧬 Project Mission

"Teaching Language Models to 'Read' Chemistry."

ChemLLM-Adapter is a state-of-the-art Fine-Tuning Engine designed to bridge the gap between Generative AI and Computational Toxicology. By leveraging QLoRA (Quantized Low-Rank Adaptation), this system adapts the massive OLMo-7B model to predict molecular toxicity directly from chemical sequences (SMILES), democratizing drug discovery research on consumer hardware.


🔬 Core Engineering Innovations

Feature The Tech Behind It
🧠 QLoRA Fine-Tuning Implements 4-bit Normal Float (nf4) quantization to compress the 7B model, while attaching trainable LoRA adapters (r=32, alpha=64) for efficient learning.
🧪 Molecular Tokenization Converts raw chemical formulas (e.g., C(=O)O) into structured semantic prompts:
Molecule: [SMILES] \nTask: [Assay] \nIs Toxic: ?
⚖️ Smart Balancing The dataset.py engine automatically handles the severe class imbalance in Tox21 by applying 3x Oversampling to rare toxic samples.
🛡️ DeepChem Integration Uses RDKit and DeepChem loaders to validate molecular integrity before ingestion into the neural network.

⚙️ System Architecture

Visualizing the Data Flow from Molecule to Prediction:

graph TD
    A[🧪 Tox21 Raw Data] -->|Validation| B(dataset.py)
    B -->|Tokenize| C{Molecules}
    D[🤖 OLMo-7B Model] -->|Quantize| E(model.py)
    C & E -->|Train Loop| F[train.py]
    F --> G[🚀 Fine-Tuned Adapter]
Loading

About

Fine-tuning OLMo-7B with QLoRA & DeepChem for Molecular Toxicity Prediction on the Tox21 dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages