A full end-to-end dose-finding analysis pipeline in R, covering optimal study design, exploratory analysis, MED identification, and PK/PD modelling — from sample size calculation to a safety-adjusted minimum effective dose.
This project simulates and analyses a Phase II dose-finding clinical trial for a hypothetical drug tested at four dose levels (0, 10, 100, and 200 mg). The goal is to identify the Minimum Effective Dose (MED) — the lowest dose that produces a clinically meaningful change in pharmacodynamic effect relative to placebo — while accounting for exposure variability and side-effect risk.
The analysis is structured into four tasks, each building on the last.
- Cohen's d-based sample size calculation using
pwr - Built a sigmoidal Emax (Hill) model in
PopEDto characterise the dose–response relationship - Optimised active dose levels using Adaptive Random Search (ARS) + Local Search (LS)
- Generated model predictions with and without prediction intervals
- Quantified parameter uncertainty via Monte Carlo sampling from the Fisher Information Matrix (FIM)
- Explored sensitivity of dose–response shape across variations in Emax, ED50, and Hill coefficient
- Loaded and tidied real-format clinical trial data (
data_7.csv) usingdplyrandtidyr - Stratified sampling of 200 subjects across four dose groups
- Computed summary statistics: mean, SD, median, trimmed mean, Hodges-Lehmann estimate
- Visualisations: boxplots, histograms, spaghetti plots of effect over time, and a full pairwise correlation matrix with significance annotation (
GGally)
- Pairwise dose vs. placebo comparisons using:
- Welch's t-test (one-sided, µ = 75, α = 0.05)
- Hodges-Lehmann (Wilcoxon) robust estimator
- Multiple comparison correction via Holm's method
- MED defined as the lowest dose whose 95% lower confidence bound exceeds the target effect (Δ = 75)
- Side-effect analysis using Fisher's Exact Test at each dose level
- Fitted 12 models total across three exposure metrics (Dose, AUC, Cmax):
- Simple linear regression
- Multiple linear regression (with covariates: age, weight, sex, height, side effects)
- Hyperbolic Emax (no Hill coefficient)
- Sigmoidal Emax (Hill equation) using
nlsLM
- Model selection via AIC and BIC
- Best model: Sigmoidal Emax (AUC-based) — supported by lowest AIC/BIC
- Diagnostic plots: observed vs. predicted, residuals vs. fitted, histogram of residuals, QQ-plot
- MED via Monte Carlo simulation: parametric bootstrap (B = 10,000) from the multivariate parameter distribution to identify the critical AUC where the 95% lower bound of predicted effect exceeds Δ = 75
- Converted AUC threshold to dose using a linear PK bridge model
- Safety analysis: logistic regression modelling side-effect probability at MED vs. placebo
| File | Description |
|---|---|
Dose finding study.R |
Full analysis script — all 4 tasks |
data_7.csv |
Simulated clinical trial dataset (200 subjects, 4 dose groups) |
- Optimal dose levels identified via PopED differ from naive equal-spacing assumptions
- Welch t-test and Hodges-Lehmann agree on MED identification
- Sigmoidal Emax (AUC) outperforms all linear models on AIC/BIC
- MED dose is computed with an explicit 95% confidence guarantee, not just a point estimate
- Side-effect risk at MED is quantified and benchmarked against placebo
install.packages(c(
"pwr", "readr", "dplyr", "tidyr", "ggplot2",
"PopED", "minpack.lm", "mvtnorm", "patchwork",
"DoseFinding", "DescTools", "boot", "GGally", "MASS"
))- Clone the repo
- Place
data_7.csvin your working directory (or update thesetwd()path in Task 2) - Source or run
Dose finding study.Rsection by section — each task is clearly delimited with# Task N----comments
This analysis was completed as part of the Preclinical and Clinical Data Analysis course in the MSc Pharmaceutical Modelling programme at Uppsala University.
Part of my pharmacometrics portfolio — see my GitHub profile for more.