Smaller, faster models distilled from NVIDIA's Alpamayo-R1 vision-language-action model for autonomous driving.
This repo contains:
- Data pipeline for training on PhysicalAI-AV
- Teacher label generation from Alpamayo-R1
- Student models (~485M params) - compact trajectory predictors using unicycle kinematics
The student model predicts acceleration and steering curvature (rather than waypoints directly), then integrates through a unicycle kinematic model. This approach is more stable to train and produces physically plausible trajectories. See the Alpamayo paper for details on why action-based prediction outperforms direct waypoint regression.
git clone https://github.com/mu-hashmi/alpamayo-r1-distilled.git
cd alpamayo-r1-distilled
# Clone external dependencies
mkdir -p external && cd external
git clone https://github.com/NVlabs/alpamayo.git
git clone https://github.com/NVlabs/physical_ai_av.git
cd ..
# Install (CPU-only, for training the student)
uv sync
# For teacher label generation (requires GPU), also install:
uv pip install torch sentence-transformers
uv pip install -e external/physical_ai_av
uv pip install -e external/alpamayo# Extract camera frames from HuggingFace dataset
uv run python scripts/extract_frames.py --split train --resolution 224 --num-samples 10000
uv run python scripts/extract_frames.py --split val --resolution 224 --num-samples 1000uv run python scripts/generate_teacher_labels.py --split train
uv run python scripts/generate_teacher_labels.py --split valSupports sharding for multi-GPU:
CUDA_VISIBLE_DEVICES=0 uv run python scripts/generate_teacher_labels.py --split train --shard 0/4
CUDA_VISIBLE_DEVICES=1 uv run python scripts/generate_teacher_labels.py --split train --shard 1/4
# ...
uv run python scripts/merge_shards.py --split train --num-shards 4Provides velocity context for unicycle integration:
uv run python scripts/generate_ego_history.py --split alluv run python -m src.distillation.train --model baseline --offline --max-epochs 2This configuration includes the full stabilization stack for reliable training:
uv run python -m src.distillation.train \
--model baseline \
--model-size 500m \
--lr 1e-5 \
--grad-clip 1.0 \
--ema --ema-decay 0.999 \
--action-weight 0.1 \
--freeze-bn \
--batch-size 32 \
--max-epochs 50Key flags:
--lr 1e-5- Low learning rate (sequential integration amplifies gradients)--grad-clip 1.0- Prevents gradient explosions from bad batches--ema- Smooths weight oscillations--action-weight 0.1- Auxiliary supervision on predicted actions--freeze-bn- Uses pretrained BatchNorm statistics (avoids train/eval mismatch)
For comparison, you can train without unicycle integration:
uv run python -m src.distillation.train \
--model baseline \
--no-unicycle \
--lr 1e-4 \
--batch-size 32This predicts (x, y, yaw) directly. Expect higher error and less stable training.
# nuScenes
uv run python -m src.distillation.train \
--model baseline \
--benchmark nuscenes \
--nuscenes-root /path/to/nuscenes \
--input-type raster \
--num-modes 6
# Argoverse 2
uv run python -m src.distillation.train \
--model baseline \
--benchmark argoverse2 \
--argoverse2-root /path/to/av2 \
--input-type vector \
--num-modes 6uv run python -m src.distillation.train \
--model baseline \
--resume checkpoints/baseline_seed42_latest.ptSee docs/TRAINING.md for all configuration options.
- Alpamayo-R1 Paper - Architecture and training details
- Alpamayo-R1 Model - Teacher model
- PhysicalAI-AV Dataset - Training data
MIT