[CS598] Wav2Sleep Classification Model Contribution by Hannah877 · Pull Request #959 · sunlabuiuc/PyHealth

Hannah877 · 2026-04-10T10:10:37Z

Contributor

Name: Yihan Zhang
NetID/Email: yihan20 yihan20@illinois.edu

Type of contribution & Original Paper

model contribution
original paper: https://arxiv.org/abs/2411.04644

High-level description

Wav2Sleep is a multi-modal architecture designed for automated sleep staging using synchronized raw signals (e.g., ECG and Respiratory).

Key Implementation Details:
• Temporal Encoders: Utilizes specialized CNN-based feature extractors for different sampling frequencies.
• Transformer Backbone: Implements a global Transformer block to capture long-range dependencies between sleep epochs.
• Stochastic Masking: Includes a robust fusion mechanism that handles missing modalities during training, as described in the original paper.
• Custom Output Handling: Implements a sequence-aware output head to bypass standard label processing limitations in pyhealth, ensuring compatibility with sequence prediction tasks.

File guide listing which files to review

• pyhealth/models/wav2sleep.py: Core model architecture and logic.
• pyhealth/models/init.py: Registration of Wav2Sleep.
• tests/core/test_wav2sleep.py: Unit tests covering instantiation, forward pass, output shapes, and gradient computation.
• examples/sleep_staging_wav2sleep.py: Comprehensive ablation study on synthetic data.
• docs/api/models/pyhealth.models.wav2sleep.rst: API documentation.
• docs/api/models.rst: Added Wav2Sleep entry to models table

Ablation Study Summary

I evaluated the model's sensitivity to two key hyperparameters using synthetic sleep staging data (5-stage classification). The experimental setup follows the data structure of the SHHS dataset supported by PyHealth, but uses synthetic tensors for fast reproducibility.

1. Embedding Dimension Ablation: testing the impact of latent space capacity.

Configuration	Accuracy	Macro-F1
dim=64	0.2700	0.2629
dim=128 (Best)	0.4000	0.3964
dim=256	0.3300	0.3236

2. Transformer Layers Ablation: testing the impact of architectural depth.

Configuration	Accuracy	Macro-F1
layers=1	0.3300	0.3287
layers=2 (Optimal)	0.3600	0.3625
layers=4	0.3600	0.3567

Findings & Conclusion
Optimal Capacity: The model performs best with a 128-dimensional embedding. Increasing to 256 leads to overfitting on small-scale synthetic data.
Depth Efficiency: A 2-layer Transformer backbone is sufficient for capturing temporal dependencies in these signals; additional layers do not yield significant gains.

Hannah877 added 3 commits April 10, 2026 17:48

add wav2sleep model initial template draft

62872ac

more wav2sleep model implementation details

ed4be3d

Add comprehensive model, test, and ablation study

a26d7ef

Hannah877 marked this pull request as ready for review April 18, 2026 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CS598] Wav2Sleep Classification Model Contribution#959

[CS598] Wav2Sleep Classification Model Contribution#959
Hannah877 wants to merge 3 commits intosunlabuiuc:masterfrom
Hannah877:feat/wav2sleep-contribution

Hannah877 commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Hannah877 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contributor

Type of contribution & Original Paper

High-level description

File guide listing which files to review

Ablation Study Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Hannah877 commented Apr 10, 2026 •

edited

Loading