Skip to content

[CS598] Wav2Sleep Classification Model Contribution#959

Open
Hannah877 wants to merge 3 commits intosunlabuiuc:masterfrom
Hannah877:feat/wav2sleep-contribution
Open

[CS598] Wav2Sleep Classification Model Contribution#959
Hannah877 wants to merge 3 commits intosunlabuiuc:masterfrom
Hannah877:feat/wav2sleep-contribution

Conversation

@Hannah877
Copy link
Copy Markdown

@Hannah877 Hannah877 commented Apr 10, 2026

Contributor

Name: Yihan Zhang
NetID/Email: yihan20 yihan20@illinois.edu

Type of contribution & Original Paper

model contribution
original paper: https://arxiv.org/abs/2411.04644

High-level description

Wav2Sleep is a multi-modal architecture designed for automated sleep staging using synchronized raw signals (e.g., ECG and Respiratory).

Key Implementation Details:
• Temporal Encoders: Utilizes specialized CNN-based feature extractors for different sampling frequencies.
• Transformer Backbone: Implements a global Transformer block to capture long-range dependencies between sleep epochs.
• Stochastic Masking: Includes a robust fusion mechanism that handles missing modalities during training, as described in the original paper.
• Custom Output Handling: Implements a sequence-aware output head to bypass standard label processing limitations in pyhealth, ensuring compatibility with sequence prediction tasks.

File guide listing which files to review

• pyhealth/models/wav2sleep.py: Core model architecture and logic.
• pyhealth/models/init.py: Registration of Wav2Sleep.
• tests/core/test_wav2sleep.py: Unit tests covering instantiation, forward pass, output shapes, and gradient computation.
• examples/sleep_staging_wav2sleep.py: Comprehensive ablation study on synthetic data.
• docs/api/models/pyhealth.models.wav2sleep.rst: API documentation.
• docs/api/models.rst: Added Wav2Sleep entry to models table

Ablation Study Summary

I evaluated the model's sensitivity to two key hyperparameters using synthetic sleep staging data (5-stage classification). The experimental setup follows the data structure of the SHHS dataset supported by PyHealth, but uses synthetic tensors for fast reproducibility.

1. Embedding Dimension Ablation: testing the impact of latent space capacity.

Configuration Accuracy Macro-F1
dim=64 0.2700 0.2629
dim=128 (Best) 0.4000 0.3964
dim=256 0.3300 0.3236

2. Transformer Layers Ablation: testing the impact of architectural depth.

Configuration Accuracy Macro-F1
layers=1 0.3300 0.3287
layers=2 (Optimal) 0.3600 0.3625
layers=4 0.3600 0.3567

Findings & Conclusion
Optimal Capacity: The model performs best with a 128-dimensional embedding. Increasing to 256 leads to overfitting on small-scale synthetic data.
Depth Efficiency: A 2-layer Transformer backbone is sufficient for capturing temporal dependencies in these signals; additional layers do not yield significant gains.

@Hannah877 Hannah877 marked this pull request as ready for review April 18, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant