YOLOv8 Training & Evaluation Pipeline

A pipeline for retraining YOLOv8 models on custom datasets and evaluating/visualizing model performance.

Installation

pip install -r requirements.txt

Pipeline 1 — Retraining (`run_pipeline.py`)

Merges datasets, validates data, trains a YOLOv8 model on COCO + custom classes, evaluates it, and saves a report.

Configuration

All training settings are split across two files:

config.yaml — top-level pipeline config:

Key	Description
`base_model`	Base weights to fine-tune (e.g. `yolov8n.pt`)
`experiment_name`	Name of the run folder under `runs/`
`train_config`	Path to training hyperparameters file
`merge.datasets`	List of dataset paths to merge
`custom_classes`	Custom class IDs and names appended after COCO's 80
`dataset_config`	Path to the merged `data.yaml` used for training/eval

configs/train.yaml — training hyperparameters:

Key	Description
`epochs`	Number of training epochs
`batch`	Batch size
`imgsz`	Input image size
`lr0`	Initial learning rate
`freeze`	Number of backbone layers to freeze
`patience`	Early stopping patience
`device`	GPU device index or `"cpu"`

Running the pipeline

python run_pipeline.py

The pipeline runs these steps automatically:

Dataset validation — checks the merged dataset for missing labels or corrupt images
Training — fine-tunes the base model using configs/train.yaml
Evaluation — computes mAP50, mAP50-95 on the validation set
Model comparison — compares trained model against the base model
Report — saves a human-readable report to runs/reports/

Trained model weights are saved to models/model_<timestamp>.pt and models/latest.pt.

Pipeline 2 — Evaluation & Visualization (`evaluate_pipeline.py`)

Evaluates and compares multiple YOLO models on a dataset, then runs inference on images and saves annotated results to disk.

Configuration

Edit config_evaluate.yaml to change any setting:

Key	Description
`dataset`	Path to the COCO-compatible `.yaml` used for `model.val()`. Must match the model's class space (80 COCO classes)
`openimages_dir`	Local folder where the Open Images v7 subset is downloaded and exported. Used only for visualization
`img_size`	Inference image size
`conf`	Confidence threshold (0–1). Raise to reduce false positives
`iou`	IoU threshold for NMS
`device`	GPU index or `"cpu"`
`model_paths`	Dict of model names and their `.pt` paths to compare
`reports_dir`	Folder where `evaluation_results.json` is saved
`output_dir`	Folder where annotated visualization images are saved

Two datasets are used for different purposes:

dataset (coco128) — used for model.val() metrics. Must be class-compatible with the models (80 COCO classes).

openimages_dir — 500 validation images auto-downloaded from Open Images v7 via FiftyOne, used only for inference visualization. These have different class labels and cannot be used for evaluation.

Tip: If predictions show too many overlapping boxes, increase conf (e.g. 0.5).

Running the evaluation

python evaluate_pipeline.py

This script will:

Download & export 500 Open Images v7 validation images via FiftyOne into openimages_dir in YOLO format (skipped automatically if already present on disk)
Evaluate each model listed in model_paths against dataset (coco128), printing mAP50, mAP50-95, precision, recall, and F1
Compare models and report which performs best per metric
Save results to evaluation_reports/evaluation_results.json
Run inference on the downloaded Open Images using yolo26n.pt, drawing bounding boxes and labels with per-class colors, and saving annotated images to output_dir

Project Structure

train_yolov8/
├── run_pipeline.py          # Retraining pipeline
├── evaluate_pipeline.py     # Evaluation & visualization pipeline
├── config.yaml              # Training pipeline config
├── config_evaluate.yaml     # Evaluation pipeline config
├── configs/
│   ├── train.yaml           # Training hyperparameters
│   └── augment.yaml         # Augmentation settings
├── datasets/                # Input datasets
├── models/                  # Saved model weights
├── runs/                    # Training runs and reports
├── evaluation_reports/      # Evaluation JSON results
├── output_predictions/      # Annotated inference images
├── src/
│   ├── training/train.py
│   ├── eval/evaluate.py
│   └── data/validate.py
└── requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
configs		configs
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
check_coco_balanced_distribution.py		check_coco_balanced_distribution.py
compare_models.py		compare_models.py
config.yaml		config.yaml
config_evaluate.yaml		config_evaluate.yaml
create_balanced_dataset.py		create_balanced_dataset.py
evaluate_pipeline.py		evaluate_pipeline.py
merge_and_split_datasets.py		merge_and_split_datasets.py
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YOLOv8 Training & Evaluation Pipeline

Installation

Pipeline 1 — Retraining (`run_pipeline.py`)

Configuration

Running the pipeline

Pipeline 2 — Evaluation & Visualization (`evaluate_pipeline.py`)

Configuration

Running the evaluation

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YOLOv8 Training & Evaluation Pipeline

Installation

Pipeline 1 — Retraining (run_pipeline.py)

Configuration

Running the pipeline

Pipeline 2 — Evaluation & Visualization (evaluate_pipeline.py)

Configuration

Running the evaluation

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Pipeline 1 — Retraining (`run_pipeline.py`)

Pipeline 2 — Evaluation & Visualization (`evaluate_pipeline.py`)

Packages