What question did this study set out to answer?

The aim is to assess a multitask AI model for analyzing sleep-related conditions from nocturnal breathing sounds.

May 10, 2026

1260 A Large Multitask Sound AI Model for Comprehensive Sleep Analysis from Nocturnal Breathing Sounds

Puntos clave

The aim is to assess a multitask AI model for analyzing sleep-related conditions from nocturnal breathing sounds.
Developed a multi-head classifier integrating sleep-related tasks using 2,973 nights of PSG data.
Evaluated the model on an independent dataset of 802 nights.
Used 80 consecutive Mel-spectrogram frames per 30-second epoch for processing.
Unified model achieved 80.6% accuracy for sleep staging and 89.5% accuracy for OSA detection.
Integrated tasks improved slightly: desaturation (F1 0.80, sensitivity 0.88), arousal (F1 0.67, sensitivity 0.74), sleep position (F1 0.88, accuracy 84.0%).
The system maintains performance comparable to existing single-task models.

Resumen

Abstract Introduction Recent sound-based AI models have shown strong performance in predicting sleep staging, obstructive sleep apnea (OSA) events, and snore detection from nocturnal breathing sounds. Prior studies have also demonstrated that breathing sounds contain physiological signatures related to oxygen desaturation, arousals, and sleep position. However, current models typically use a shared feature extractor followed by separate task-specific classifiers, leading to rapid growth in parameters and training time as the number of tasks increases. To address this, we converted one of the existing classifiers into a multi-head classifier that outputs multiple tasks via lightweight multilayer perceptron (MLP) heads, incurring minimal additional computational cost. This study evaluates the unified model across six sleep-related tasks. Methods The model integrates four tasks - OSA event, desaturation, arousal, and sleep position - into one shared classifier with lightweight MLP heads (sleep staging and snore detection retain their existing dedicated classifiers). The architecture processes 80 consecutive Mel-spectrogram frames representing each 30-second epoch. Training was performed using 2,973 nights of PSG data with synchronized audio, and evaluation was conducted on an independent dataset of 802 nights (age 50.5 ± 15.4; BMI 25.5 ± 3.6; AHI 21.2 ± 20.6; male:female = 530:272). All tasks, except sleep staging, were trained at the sub-epoch level but evaluated at the epoch level for comparison with previous single-task models. Results The unified model achieved performance comparable to single-task baselines. The three tasks maintained robust performance (sleep staging: macro F1 0.77, accuracy 80.6%; OSA: macro F1 0.78, accuracy 89.5%; snore detection: macro F1 0.89, accuracy 90.4%). The newly integrated tasks showed slight improvements relative to previously reported models: desaturation (F1 0.80, sensitivity 0.88, specificity 0.92 for desaturation-containing epochs), arousal (F1 0.67, sensitivity 0.74, specificity 0.91 for arousal-containing epochs), sleep position (F1 0.88, sensitivity 0.89, specificity 0.74 for supine epochs; overall accuracy 84.0%, macro F1 0.82). Conclusion The ability to jointly infer diverse sleep events from a single multi-head classifier suggests that these tasks rely on overlapping physiological representations. This scalable architecture enables efficient multitask learning without loss of accuracy and provides a strong basis for developing a sound-based foundation model capable of comprehensive sleep analysis. Support (if any)

Me gusta

Guardar

Cite This Study

Kim et al. (Fri,) studied this question.

synapsesocial.com/papers/6a002126c8f74e3340f9bff3 https://doi.org/https://doi.org/10.1093/sleep/zsag091.1259

Me gusta

Guardar