Polysomnography (PSG) provides the gold standard for sleep assessment but suffers from substantial heterogeneity across recording devices and cohorts. There have been growing efforts to build general-purpose foundation models (FMs) for sleep physiology, but lack an in-depth understanding of the pre-training process and scaling patterns that lead to more generalizable sleep FMs.
To fill this gap, we curate a massive corpus of 166,500 hours of sleep recordings from nine public sources and establish SleepBench, a comprehensive, fully open-source benchmark. Leveraging SleepBench, we systematically evaluate four families of self-supervised pre-training objectives and uncover three critical findings: (1) existing FMs fail to generalize to missing channels at inference; (2) channel-invariant feature learning is essential for pre-training; and (3) scaling sample size, model capacity, and multi-source data mixture consistently improves downstream performance.
With an enhanced pre-training and scaling recipe, we introduce OSF, a family of sleep FMs that achieves state-of-the-art performance across nine datasets on diverse sleep and disease prediction tasks. Further analysis of OSF also reveals intriguing properties in sample efficiency, hierarchical aggregation, and cross-dataset scaling.
Which pre-training and scaling design choices truly improve the generalization of sleep FMs, especially under cohort shift and missing-channel inference?
Sleep foundation models promise to unify diverse recording setups and patient populations, but current approaches have not been systematically evaluated under realistic deployment scenarios. OSF addresses this gap through comprehensive benchmarking and principled pre-training design.
Through systematic evaluation on SleepBench, we uncover three critical insights that guide the design of more robust and generalizable sleep foundation models.
Existing sleep FMs fail to generalize under missing-channel inference, motivating pre-training designs that explicitly handle channel incompleteness.
Explicitly encouraging channel-invariant feature learning during pre-training improves robustness and downstream transfer, particularly for contrastive and distillation-based methods.
Scaling laws emerge in sleep data; jointly scaling model and data size yields the strongest gains across diverse downstream tasks.
OSF is evaluated across nine datasets on diverse sleep staging and event detection tasks. It consistently achieves state-of-the-art performance under linear probing, few-shot learning, and fine-tuning settings.
Best overall performance on multi-class sleep stage classification across diverse cohorts.
Superior detection of sleep events including arousal, hypopnea, oxygen desaturation, and central apnea.
Extracted embeddings better capture disease-related information.
Strong performance with frozen features, demonstrating high-quality representations.
Further gains when adapting to specific downstream tasks with full model training.
Sample efficiency in adapting to downstream tasks.
We evaluate models on four sleep event detection tasks. As shown in the table, OSF achieves state-of-the-art performance on both sleep staging and event detection under linear probing and fine-tuning.
OSF consistently outperforms SleepFM across these missing-channel settings. Specifically, (1) OSF makes better use of the available channels. With brain-activity channels only, it achieves stronger sleep staging and arousal detection, suggesting stronger brain-related representations. Similarly, with respiratory channels only, it achieves stronger performance on hypopnea and oxygen desaturation.
(2) OSF is more robust when key modalities are missing. When respiratory signals are removed, both methods degrade on hypopnea and oxygen desaturation, but OSF remains consistently better. Conversely, when brain-related channels are unavailable, sleep staging becomes much harder for both models; nevertheless, OSF better uses the remaining channels and yields stronger performance.
@article{shuai2026osf,
title={OSF: On Pre-training and Scaling of Sleep Foundation Models},
author={Shuai, Zitao and Xu, Zongzhe and Yang, David and Wang, Wei and Yang, Yuzhe},
journal={arXiv preprint},
year={2026}
}