A pipeline for retraining YOLOv8 models on custom datasets and evaluating/visualizing model performance.
pip install -r requirements.txtMerges datasets, validates data, trains a YOLOv8 model on COCO + custom classes, evaluates it, and saves a report.
All training settings are split across two files:
config.yaml — top-level pipeline config:
| Key | Description |
|---|---|
base_model |
Base weights to fine-tune (e.g. yolov8n.pt) |
experiment_name |
Name of the run folder under runs/ |
train_config |
Path to training hyperparameters file |
merge.datasets |
List of dataset paths to merge |
custom_classes |
Custom class IDs and names appended after COCO's 80 |
dataset_config |
Path to the merged data.yaml used for training/eval |
configs/train.yaml — training hyperparameters:
| Key | Description |
|---|---|
epochs |
Number of training epochs |
batch |
Batch size |
imgsz |
Input image size |
lr0 |
Initial learning rate |
freeze |
Number of backbone layers to freeze |
patience |
Early stopping patience |
device |
GPU device index or "cpu" |
python run_pipeline.pyThe pipeline runs these steps automatically:
- Dataset validation — checks the merged dataset for missing labels or corrupt images
- Training — fine-tunes the base model using
configs/train.yaml - Evaluation — computes mAP50, mAP50-95 on the validation set
- Model comparison — compares trained model against the base model
- Report — saves a human-readable report to
runs/reports/
Trained model weights are saved to models/model_<timestamp>.pt and models/latest.pt.
Evaluates and compares multiple YOLO models on a dataset, then runs inference on images and saves annotated results to disk.
Edit config_evaluate.yaml to change any setting:
| Key | Description |
|---|---|
dataset |
Path to the COCO-compatible .yaml used for model.val(). Must match the model's class space (80 COCO classes) |
openimages_dir |
Local folder where the Open Images v7 subset is downloaded and exported. Used only for visualization |
img_size |
Inference image size |
conf |
Confidence threshold (0–1). Raise to reduce false positives |
iou |
IoU threshold for NMS |
device |
GPU index or "cpu" |
model_paths |
Dict of model names and their .pt paths to compare |
reports_dir |
Folder where evaluation_results.json is saved |
output_dir |
Folder where annotated visualization images are saved |
Two datasets are used for different purposes:
dataset(coco128) — used formodel.val()metrics. Must be class-compatible with the models (80 COCO classes).openimages_dir— 500 validation images auto-downloaded from Open Images v7 via FiftyOne, used only for inference visualization. These have different class labels and cannot be used for evaluation.
Tip: If predictions show too many overlapping boxes, increase conf (e.g. 0.5).
python evaluate_pipeline.pyThis script will:
- Download & export 500 Open Images v7 validation images via FiftyOne into
openimages_dirin YOLO format (skipped automatically if already present on disk) - Evaluate each model listed in
model_pathsagainstdataset(coco128), printing mAP50, mAP50-95, precision, recall, and F1 - Compare models and report which performs best per metric
- Save results to
evaluation_reports/evaluation_results.json - Run inference on the downloaded Open Images using
yolo26n.pt, drawing bounding boxes and labels with per-class colors, and saving annotated images tooutput_dir
train_yolov8/
├── run_pipeline.py # Retraining pipeline
├── evaluate_pipeline.py # Evaluation & visualization pipeline
├── config.yaml # Training pipeline config
├── config_evaluate.yaml # Evaluation pipeline config
├── configs/
│ ├── train.yaml # Training hyperparameters
│ └── augment.yaml # Augmentation settings
├── datasets/ # Input datasets
├── models/ # Saved model weights
├── runs/ # Training runs and reports
├── evaluation_reports/ # Evaluation JSON results
├── output_predictions/ # Annotated inference images
├── src/
│ ├── training/train.py
│ ├── eval/evaluate.py
│ └── data/validate.py
└── requirements.txt