Self-Supervised Learning on STL-10 (SimCLR) — Portfolio Project

A clean, reproducible Self-Supervised Learning (SSL) project that demonstrates SimCLR pretraining on STL-10 unlabeled (100k images) and evaluates learned representations with standard protocols:

kNN@K on frozen embeddings
Linear probe (frozen encoder + linear classifier)
UMAP visualization of embedding space
Nearest-neighbor retrieval in embedding space (cosine similarity)

This repo is designed to be portfolio-ready:

runs on a single GPU (e.g., RTX 2070),
produces structured artifacts (logs / checkpoints / metrics),
keeps training in scripts and analysis in notebooks.

TL;DR (final results)

Strong SimCLR run: simclr_version_4 (50 epochs)

kNN@20 accuracy: 0.7405
Linear-probe accuracy (20 epochs): 0.7360

All results are reproducible from artifacts in artifacts/ and summarized in:

artifacts/metrics/runs_index.csv
artifacts/metrics/summary.csv

Project overview

Training (scripts)

Train SimCLR on STL-10 unlabeled using src/train_ssl.py
Logs go to artifacts/logs/…
Checkpoints go to artifacts/checkpoints/…
Run registry + aggregated metrics go to artifacts/metrics/…

Evaluation (scripts)

src/eval_knn.py — kNN on frozen embeddings (STL-10 train → test)
src/eval_linear.py — linear probe on frozen embeddings

Analysis (notebooks)

01_augmentations_preview.ipynb — why augmentations matter in SSL
02_experiments_report_fixed.ipynb — training curves / loss analysis
03_umap_embeddings_fixed.ipynb — compute embeddings + UMAP visualization
04_retrieval_demo.ipynb — nearest-neighbor retrieval + Hit@10 sanity check
05_ssl_final_simclr.ipynb — final showcase (all key results in one notebook)

Quickstart

1) Create environment

Option A: conda (recommended)

conda env create -f environment.yml
conda activate ssl_env

Option B: pip

python -m venv .venv

# Windows (PowerShell):
.\.venv\Scripts\Activate.ps1
# Windows (cmd):
.venv\Scripts\activate.bat
# Git Bash:
source .venv/Scripts/activate

pip install -r requirements.txt

Run: training → evaluation → final notebook

2) Train SimCLR (strong config)

python -m src.train_ssl --config configs/simclr_r18_stl10_strong.yaml

This will create:

artifacts/logs/simclr/version_X/metrics.csv
artifacts/checkpoints/simclr/simclr_version_X/{last.ckpt,best.ckpt}
update artifacts/metrics/runs_index.csv and artifacts/metrics/summary.csv

3) Evaluate representation quality

kNN@20

python -m src.eval_knn --project-root . --k 20 --use best

Linear probe (20 epochs)

python -m src.eval_linear --project-root . --epochs 20 --use best

After running these scripts, open:

artifacts/metrics/summary.csv (updated with knn_acc and linear_acc)

4) Open final showcase notebook

Run Jupyter and open:

notebooks/05_ssl_final_simclr.ipynb

This notebook reproduces:

training curves (loss),
kNN + linear-probe metrics,
UMAP embeddings,
retrieval demo + Hit@10 sanity metric.

Reproducibility notes

Paths are handled relative to the project root (PROJECT_ROOT in notebooks).
The repo stores:
- checkpoints (best.ckpt, last.ckpt)
- metrics logs (metrics.csv)
- run registry (runs_index.csv)
- aggregated summary (summary.csv)
Notebooks are analysis-only: they do not train models.

Project structure

├── artifacts/
│   ├── checkpoints/
│   ├── embeddings/
│   ├── figures/
│   ├── logs/
│   └── metrics/
├── configs/
├── data/
├── lightning_logs/          # optional (legacy Lightning dir if used)
├── notebooks/
└── src/
    ├── data/
    ├── losses/
    ├── models/
    └── utils/

Future work (optional)

Add BYOL (non-contrastive SSL) and compare side-by-side with the same metrics (kNN + linear probe + UMAP + retrieval).
Add FAISS indexing for large-scale retrieval (engineering upgrade; not required for this portfolio version).

References

SimCLR: Chen et al., 2020 — A Simple Framework for Contrastive Learning of Visual Representations
STL-10 dataset: Coates et al., 2011
PyTorch Lightning for clean training loops

License

MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Supervised Learning on STL-10 (SimCLR) — Portfolio Project

TL;DR (final results)

Project overview

Training (scripts)

Evaluation (scripts)

Analysis (notebooks)

Quickstart

1) Create environment

Run: training → evaluation → final notebook

2) Train SimCLR (strong config)

3) Evaluate representation quality

4) Open final showcase notebook

Reproducibility notes

Project structure

Future work (optional)

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
artifacts		artifacts
configs		configs
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Self-Supervised Learning on STL-10 (SimCLR) — Portfolio Project

TL;DR (final results)

Project overview

Training (scripts)

Evaluation (scripts)

Analysis (notebooks)

Quickstart

1) Create environment

Run: training → evaluation → final notebook

2) Train SimCLR (strong config)

3) Evaluate representation quality

4) Open final showcase notebook

Reproducibility notes

Project structure

Future work (optional)

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages