Skip to content

CMU-SAFARI/Virtuoso

Repository files navigation

Virtuoso: Fast and Accurate Virtual Memory Research via Imitation-based OS Simulation

ASPLOS 2025 License: MIT

Virtual memory is a significant performance bottleneck across modern workloads. Researchers need tools to explore new hardware/OS co-designs that optimize virtual memory across diverse applications and systems. Existing tools either lack accuracy in modeling OS software components or are too slow for prototyping designs that span the hardware/software boundary.

Virtuoso addresses this challenge through an imitation-based OS simulation methodology. At its core is MimicOS, a lightweight userspace OS kernel that imitates only the necessary kernel functionalities (e.g., physical memory allocation, page fault handling). MimicOS accelerates simulation compared to full-system OS simulation while providing accessible high-level programming interfaces for developing new OS memory management routines, enabling flexible and precise evaluation of virtual memory's application-level and system-level effects.

Virtuoso integrates with diverse architectural simulators, each specializing in different system design aspects. It currently supports Sniper (event-driven CPU simulation) and Ramulator2 (cycle-accurate DRAM timing), with a shared MimicOS core ensuring consistent OS behavior across simulation backends.

Konstantinos Kanellopoulos, Konstantinos Sgouras, F. Nisa Bostanci, Andreas Kosmas Kakolyris, Berkin Kerim Konar, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar, Nandita Vijaykumar, and Onur Mutlu, "Virtuoso: Enabling Fast and Accurate Virtual Memory Research via an Imitation-based Operating System Simulation Methodology," ASPLOS 2025. [Paper]

Table of Contents

Key Features

MimicOS: Imitation-based OS Kernel

MimicOS is a lightweight userspace kernel that imitates the OS memory management subsystem. Rather than running a full Linux kernel, MimicOS provides only the routines required for virtual memory research: physical memory allocation, page fault handling, huge page management, and swap. It uses a policy-based template design so the same allocator and OS service implementations work across Sniper, Ramulator2, and future simulator integrations.

Physical Memory Allocators

Allocator Description
Baseline Simple 4KB buddy-based page allocator
ReserveTHP Reservation-based transparent huge pages (4KB + 2MB) with configurable fragmentation
SpOT Contiguity-aware allocator exploiting OS allocation patterns (Alverti et al., ISCA '20)
ASAP Prefetched address translation with aggressive superpage allocation (Margaritov et al., MICRO '19)
Utopia Restricted segments (RestSegs) with direct VA-to-PA computation (Kanellopoulos et al., MICRO '23)
EagerPaging Contiguous physical range allocation for entire VMAs, used by RMM (Karakostas et al., ISCA '15)
NUMA ReserveTHP Multi-node reservation-based THP with per-node capacity and placement policies
Buddy Power-of-two buddy system allocator (shared foundation for all allocators)

Page Table Formats

Page Table Description
4-Level Radix Standard x86-64 radix page table (PML4/PDPT/PD/PT)
Elastic Cuckoo Hash (ECH) Cuckoo hashing with elastic bucket resizing (Skarlatos et al., ASPLOS '20)
Hash Don't Cache (HDC) Open-addressing hash table with linear probing and dynamic resizing (Yaniv and Tsafrir, SIGMETRICS '16)
Hash Table Chaining Chained hash table with dynamic resizing
Range Table B-tree based range translations for contiguous mappings (Karakostas et al., ISCA '15)

MMU Designs

MMU Description
Base Configurable multi-level TLB hierarchy with page walk caches and large page prediction
Spec Parallel speculative and conventional page walks (Barr et al., ISCA '11)
POM-TLB Part-of-Memory TLB with software-managed large TLB in DRAM (Ryoo et al., ISCA '17)
Range/RMM Range Lookup Buffer for contiguous translations with eager paging (Karakostas et al., ISCA '15)
DMT Direct Memory Translation for virtualized clouds (Zhang et al., ASPLOS '24)
Utopia RestSeg walker with CATS prediction and page migration (Kanellopoulos et al., MICRO '23)
HW Fault Hardware page fault handler with delegated memory pool
Virt Nested MMU for two-dimensional guest-to-host address translation

TLB Prefetchers

Prefetcher Description
Agile/ATP Adaptive multi-stride TLB prefetcher with frequency detection (Vavouliotis et al., ISCA '21)
Recency Pointer-table based recency-aware TLB prefetcher
Distance Distance-indexed prediction table for irregular access patterns
Stride Classic next-page stride TLB prefetcher
H2 History-based TLB prefetcher
ASP PC-indexed arbitrary stride prefetcher

Speculative Translation Engines

Engine Description
SpOT Offset-based speculation exploiting physical contiguity (Alverti et al., ISCA '20)
Oracle Perfect speculation for upper-bound analysis
SpecTLB Speculative address translation mechanism (Barr et al., ISCA '11)
ASAP Speculative translation via prefetched address translation (Margaritov et al., MICRO '19)

Additional Features

  • CHiRP: Control-flow history reuse prediction for dead-entry aware cache replacement (Mirbagher-Ajorpaz et al., MICRO '20)
  • MPLRU: Metadata-Priority LRU adaptive cache controller with multi-armed bandit tuning
  • HugeTLBfs and Swap Cache: Simulator-agnostic templates in sniper/include/
  • Multicore support: 2, 4, 8, and 16 core configurations with shared TLB hierarchies
  • NUMA support: Multi-node topologies (2-node/8-core, 4-node/16-core)
  • CXL memory tiers: Configurable CXL-attached memory with ASIC/FPGA/tiered latency models
  • Ramulator2 integration: MimicOS IPC bridge for cycle-accurate DRAM timing during page walks
  • Comprehensive experiment framework: YAML-driven configs, jobfile generation, SLURM submission, and smoke tests

Repository Structure

Virtuoso/
├── mimicos/                     # Shared MimicOS userspace kernel
│   ├── include/mm/              # Allocator factory, radix page table headers
│   └── src/                     # MimicOS core, page table, allocator implementations
│       ├── mimicos/             # MimicOS kernel runtime
│       ├── page_table/          # Page table implementations
│       ├── physical_allocator/  # Physical memory allocator implementations
│       ├── metrics/             # Performance counters and telemetry
│       └── inih/                # INI config parser
│
├── simulator/
│   ├── sniper/                  # Sniper multi-core simulator (trace-driven)
│   │   ├── common/              # Core simulation code
│   │   │   └── core/memory_subsystem/
│   │   │       └── parametric_dram_directory_msi/
│   │   │           ├── mmu_designs/           # MMU implementations
│   │   │           ├── spec_engine_designs/   # Speculative engine implementations
│   │   │           └── translation_components/
│   │   │               ├── tlb.cc/h                   # TLB model
│   │   │               ├── tlb_subsystem.cc           # TLB hierarchy
│   │   │               └── tlb_prefetching/           # TLB prefetcher implementations
│   │   ├── include/
│   │   │   └── memory_management/
│   │   │       ├── physical_memory_allocators/  # Simulator-agnostic allocator templates
│   │   │       ├── hugetlbfs.h                  # HugeTLBfs template
│   │   │       ├── swap_cache.h                 # Swap cache template
│   │   │       └── numa/                        # NUMA topology headers
│   │   ├── config/                              # Modular configuration files
│   │   │   ├── address_translation_schemes/     # Top-level composed configs
│   │   │   │   └── multicore/                   # Multi-core variants
│   │   │   ├── core_configs/                    # CPU core models
│   │   │   ├── mmu_configs/                     # MMU design configs
│   │   │   ├── pagetable_configs/               # Page table format configs
│   │   │   ├── physical_memory_allocators/      # Allocator configs
│   │   │   ├── spec_engine_configs/             # Speculative engine configs
│   │   │   ├── dram_configs/                    # DRAM timing configs
│   │   │   ├── prefetcher_configs/              # Data prefetcher configs
│   │   │   ├── cxl_configs/                     # CXL memory tier configs
│   │   │   └── common_configs/                  # Base system configs
│   │   └── testing/
│   │       └── test_config.yaml                 # Smoke test definitions
│   └── ramulator2/              # Ramulator2 DRAM simulator (submodule)
│
├── experiments/                 # Experiment framework
│   ├── clist.yaml               # Central experiment configuration
│   ├── create_experiments.py    # Jobfile generator
│   ├── safe_submit.py           # Validated SLURM submission
│   ├── get_experiments_status.py # Experiment monitoring
│   ├── create_rerun_experiments.py # Failed experiment re-launcher
│   └── vm_tlist/                # Trace list files
│
├── patches/                     # Integration patches
│   ├── apply_ramulator2_mimicos.sh  # Ramulator2 + MimicOS patch script
│   └── ramulator2_mimicos.patch     # The patch file
│
├── vma_infer/                   # VMA inference tool (infers VMAs from traces)
├── tools/                       # Analysis and mini-simulation tools
└── docs/                        # Documentation
    └── ramulator2_mimicos.md    # Ramulator2 + MimicOS integration guide

Prerequisites

Hardware

  • Architecture: x86-64 system
  • Memory: 4 to 13 GB free memory per experiment
  • Storage: 10 GB for traces and build artifacts

Software

  • Python 3.8+ (3.10 recommended)
  • libpython3-dev (matching your Python version)
  • g++ (with C++17 support)
  • GNU Make
  • PyYAML (pip install pyyaml)
  • SLURM (optional, for cluster job submission)

Quick Start

1. Clone the Repository

git clone --recursive https://github.com/CMU-SAFARI/Virtuoso.git
cd Virtuoso

If you already cloned without --recursive:

git submodule update --init --recursive

2. Install Dependencies

cd simulator/sniper/
sh install_dependencies.sh
bash  # reload environment

# Optional: create a conda environment
conda create -n virtuoso python=3.10
conda activate virtuoso

3. Build Sniper

cd simulator/sniper
make distclean  # clean previous builds
make -j         # build Sniper

4. Download Traces

cd simulator/sniper/
sh download_traces.sh

5. Run a Basic Simulation

cd simulator/sniper/
sh run_example.sh

This runs a baseline simulation with the 4-level radix page table and ReserveTHP allocator on a sample trace.

6. Run Smoke Tests

Run the comprehensive smoke test suite (22 configurations, 2M instructions each):

cd experiments/
python3 create_experiments.py \
  --artifact-path /path/to/Virtuoso \
  --yaml clist.yaml \
  --suite smoke-test-all \
  --suite-dir-name exp_smoke_test_all \
  --force

Then submit (or run locally):

python3 safe_submit.py exp_smoke_test_all/jobfile.sh --dry-run  # validate first
python3 safe_submit.py exp_smoke_test_all/jobfile.sh             # submit to SLURM

Configuration System

Virtuoso uses a modular, composable configuration system. Each top-level address translation scheme is assembled from independent config modules via #include directives:

address_translation_schemes/<scheme>.cfg
    #include common_configs/base_system_sniperspace.cfg   # Simulation mode
    #include core_configs/meteor_lake_pcore.cfg            # CPU core model
    #include mmu_configs/mmu_base.cfg                      # MMU design
    #include pagetable_configs/4_level_radix.cfg            # Page table format
    #include dram_configs/ddr4_2400.cfg                     # DRAM timing
    #include physical_memory_allocators/reserve_thp.cfg     # Memory allocator

Config Module Directories

Directory Purpose Examples
common_configs/ Simulation mode (Sniper-space vs. userspace MimicOS) base_system_sniperspace.cfg, base_system_userspace.cfg
core_configs/ CPU core microarchitecture meteor_lake_pcore.cfg, beefy.cfg, wimpy.cfg
mmu_configs/ MMU design selection and TLB hierarchy mmu_base.cfg, mmu_spec.cfg, mmu_utopia.cfg, mmu_pomtlb.cfg
pagetable_configs/ Page table format 4_level_radix.cfg, elastic_cuckoo_hash_table_asplos2020.cfg
physical_memory_allocators/ Physical memory allocation policy reserve_thp.cfg, spot.cfg, utopia.cfg, eager_paging.cfg
dram_configs/ DRAM timing model ddr4_2400.cfg, ddr5_4800.cfg, ddr5_6400.cfg
spec_engine_configs/ Speculative translation engine spec_engine_spot.cfg, spec_engine_oracle.cfg
prefetcher_configs/ Data cache prefetcher ip_stride_l1.cfg, meteor_lake_prefetch.cfg
cxl_configs/ CXL-attached memory tiers cxl.cfg, cxl-tiered.cfg

To create a new scheme, compose the desired modules:

# My custom scheme: Utopia allocator + ECH page table + SpecTLB engine
#include ./common_configs/base_system_sniperspace.cfg
#include ./core_configs/meteor_lake_pcore.cfg
#include ./mmu_configs/mmu_spec.cfg
#include ./pagetable_configs/elastic_cuckoo_hash_table_asplos2020.cfg
#include ./dram_configs/ddr4_2400.cfg
#include ./physical_memory_allocators/utopia.cfg
#include ./spec_engine_configs/spec_engine_spectlb.cfg

Experiment Framework

The experiments/ directory provides an end-to-end workflow for running large-scale studies on a SLURM cluster.

Central Configuration: clist.yaml

All experiments are defined in a single YAML file:

# Trace suites: named collections of .tlist files
trace_suite:
  top250:
    tracelist_base_path: "vm_tlist/"
    tracelists:
      - top250_cvp1.tlist
      - top250_gap.tlist
      - top250_google.tlist

# Configs: composable simulation configurations
configs:
  configs_base_path: "../simulator/sniper/config/address_translation_schemes/"
  reservethp-full-fragmentation:
    name: "reservethp-frag0.0"
    file: "reservethp.cfg"
    extends:
      - "--perf_model/reserve_thp/target_fragmentation=0.0"
    description: "ReserveTHP with fully fragmented memory"

# Experiment suites: combine traces + configs + instruction count
experiment_suites:
  my-experiment:
    description: "ReserveTHP fragmentation sweep"
    configs: [reservethp-full-fragmentation, reservethp-no-fragmentation]
    trace_suite: top250
    instruction_count: 300000000

Workflow

Step 1: Generate jobfile

python3 experiments/create_experiments.py \
  --artifact-path /path/to/Virtuoso \
  --yaml experiments/clist.yaml \
  --suite my-experiment \
  --suite-dir-name my-experiment \
  --force

This creates experiments/exp_my-experiment/ with jobfile.sh, trace/config/job CSVs, and a results/ directory.

Step 2: Validate and submit

python3 experiments/safe_submit.py experiments/exp_my-experiment/jobfile.sh \
  --max-slurm-jobs 500 \
  --slurm-retry-delay 60

safe_submit.py validates that Sniper is compiled, debug flags are off, traces exist, configs exist, and disk space is sufficient before submitting.

Step 3: Monitor progress

python3 experiments/get_experiments_status.py \
  --exp-dir experiments/exp_my-experiment

Step 4: Rerun failures

python3 experiments/create_rerun_experiments.py \
  --exp-dir experiments/exp_my-experiment \
  --status experiments/exp_my-experiment/status/error.csv \
  --jobfile experiments/exp_my-experiment/jobfile_rerun.sh

Parameter Sweeps

Configs support parameter sweeps for systematic exploration:

# Zipped sweep: paired fragmentation + promotion threshold values
reservethp_sweep:
  name: "reservethp"
  file: "reservethp.cfg"
  sweeps:
    - identifier: ["frag", "promotionthresh"]
      option: ["--perf_model/reserve_thp/target_fragmentation=",
               "--perf_model/reserve_thp/threshold_for_promotion="]
      values: [[0.0, 0.25, 0.5, 0.75, 1.0],
               [0.0, 0.0, 0.0, 0.0, 0.0]]

Multiple sweep entries produce a Cartesian product across dimensions.

Ramulator2 Integration

Virtuoso integrates with Ramulator2 for cycle-accurate DRAM timing modeling during page table walks. MimicOS communicates with Ramulator2 via an IPC bridge using pipes and the Intel SDE pintool.

For setup instructions, configuration, and architecture details, see docs/ramulator2_mimicos.md.

Quick setup:

# Apply the MimicOS integration patch to Ramulator2
bash patches/apply_ramulator2_mimicos.sh

# Build Ramulator2
cd simulator/ramulator2 && mkdir -p build && cd build && cmake .. && make -j

# Build MimicOS standalone
cd simulator/ramulator2/mimicos
make SNIPER_INCLUDE=$PWD/../../sniper/include

Smoke Tests

The smoke-test-all suite validates 22 configurations with 2M instructions on a quick trace. These configurations cover the major axes of the framework:

Category Configurations
Allocators no-translation, reservethp (frag 0.0), reservethp (frag 1.0), spot, utopia, asap
TLB Prefetchers atp-baseline, dp-baseline, recency-baseline
Speculative Engines spec-oracle, spectlb
MMU Designs pomtlb, dmt, rmm
Page Tables ech-baseline, hdc-baseline, ht-baseline, reservethp-ech, reservethp-hdc, reservethp-ht
Userspace MimicOS userspace-baseline, userspace-reservethp

Run smoke tests individually by category:

# Allocators only
--suite smoke-test-allocators

# Prefetchers only
--suite smoke-test-prefetchers

# Page table variants only
--suite smoke-test-page-tables

# MMU designs only
--suite smoke-test-mmu-designs

# Speculative engines only
--suite smoke-test-spec-engines

# Userspace MimicOS only
--suite smoke-test-userspace

Website and Documentation

Citation

@inproceedings{kanellopoulos2025virtuoso,
    title={{Virtuoso: Enabling Fast and Accurate Virtual Memory Research
            via an Imitation-based Operating System Simulation Methodology}},
    author={Konstantinos Kanellopoulos and Konstantinos Sgouras and
            F. Nisa Bostanci and Andreas Kosmas Kakolyris and
            Berkin K. Konar and Rahul Bera and Mohammad Sadrosadati and
            Rakesh Kumar and Nandita Vijaykumar and Onur Mutlu},
    year={2025},
    booktitle={ASPLOS}
}

Contributing

We welcome contributions. Please open an issue on GitHub to discuss potential changes or report bugs. Pull requests for new allocators, MMU designs, page tables, or TLB prefetchers are especially welcome. The modular architecture makes it straightforward to add new components.

License

Virtuoso is released under the MIT License, consistent with the Sniper simulator.

Contact

For questions, please contact:

About

Virtuoso is a fast, accurate and versatile simulation framework designed for virtual memory research. Virtuoso uses a new simulation methodology for estimating OS overheads and models diverse VM designs, incorporating state-of-the-art TLB techniques, page table structures etc. More details in our ASPLOS 2025 paper: https://arxiv.org/pdf/2403.04635

Resources

Stars

Watchers

Forks

Contributors