Skip to content

tue-robotics/tue_nn_tools

Repository files navigation

YOLOv8 Training & Evaluation Pipeline

A pipeline for retraining YOLOv8 models on custom datasets and evaluating/visualizing model performance.

Installation

pip install -r requirements.txt

Pipeline 1 — Retraining (run_pipeline.py)

Merges datasets, validates data, trains a YOLOv8 model on COCO + custom classes, evaluates it, and saves a report.

Configuration

All training settings are split across two files:

config.yaml — top-level pipeline config:

Key Description
base_model Base weights to fine-tune (e.g. yolov8n.pt)
experiment_name Name of the run folder under runs/
train_config Path to training hyperparameters file
merge.datasets List of dataset paths to merge
custom_classes Custom class IDs and names appended after COCO's 80
dataset_config Path to the merged data.yaml used for training/eval

configs/train.yaml — training hyperparameters:

Key Description
epochs Number of training epochs
batch Batch size
imgsz Input image size
lr0 Initial learning rate
freeze Number of backbone layers to freeze
patience Early stopping patience
device GPU device index or "cpu"

Running the pipeline

python run_pipeline.py

The pipeline runs these steps automatically:

  1. Dataset validation — checks the merged dataset for missing labels or corrupt images
  2. Training — fine-tunes the base model using configs/train.yaml
  3. Evaluation — computes mAP50, mAP50-95 on the validation set
  4. Model comparison — compares trained model against the base model
  5. Report — saves a human-readable report to runs/reports/

Trained model weights are saved to models/model_<timestamp>.pt and models/latest.pt.


Pipeline 2 — Evaluation & Visualization (evaluate_pipeline.py)

Evaluates and compares multiple YOLO models on a dataset, then runs inference on images and saves annotated results to disk.

Configuration

Edit config_evaluate.yaml to change any setting:

Key Description
dataset Path to the COCO-compatible .yaml used for model.val(). Must match the model's class space (80 COCO classes)
openimages_dir Local folder where the Open Images v7 subset is downloaded and exported. Used only for visualization
img_size Inference image size
conf Confidence threshold (0–1). Raise to reduce false positives
iou IoU threshold for NMS
device GPU index or "cpu"
model_paths Dict of model names and their .pt paths to compare
reports_dir Folder where evaluation_results.json is saved
output_dir Folder where annotated visualization images are saved

Two datasets are used for different purposes:

  • dataset (coco128) — used for model.val() metrics. Must be class-compatible with the models (80 COCO classes).
  • openimages_dir — 500 validation images auto-downloaded from Open Images v7 via FiftyOne, used only for inference visualization. These have different class labels and cannot be used for evaluation.

Tip: If predictions show too many overlapping boxes, increase conf (e.g. 0.5).

Running the evaluation

python evaluate_pipeline.py

This script will:

  1. Download & export 500 Open Images v7 validation images via FiftyOne into openimages_dir in YOLO format (skipped automatically if already present on disk)
  2. Evaluate each model listed in model_paths against dataset (coco128), printing mAP50, mAP50-95, precision, recall, and F1
  3. Compare models and report which performs best per metric
  4. Save results to evaluation_reports/evaluation_results.json
  5. Run inference on the downloaded Open Images using yolo26n.pt, drawing bounding boxes and labels with per-class colors, and saving annotated images to output_dir

Project Structure

train_yolov8/
├── run_pipeline.py          # Retraining pipeline
├── evaluate_pipeline.py     # Evaluation & visualization pipeline
├── config.yaml              # Training pipeline config
├── config_evaluate.yaml     # Evaluation pipeline config
├── configs/
│   ├── train.yaml           # Training hyperparameters
│   └── augment.yaml         # Augmentation settings
├── datasets/                # Input datasets
├── models/                  # Saved model weights
├── runs/                    # Training runs and reports
├── evaluation_reports/      # Evaluation JSON results
├── output_predictions/      # Annotated inference images
├── src/
│   ├── training/train.py
│   ├── eval/evaluate.py
│   └── data/validate.py
└── requirements.txt

About

This is a Tue Neural Network tools and utilities repo. Used for training pipelines and general prototyping.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages