What is the clinical evidence from this study?

Study design: Observational. Population: Obstructive sleep apnea (OSA) (n=147). Intervention: AHI 4% scoring definition vs. AHI 3%/arousal scoring definition. Primary outcome: Overall classification disagreement in short-interval comparisons.

What question did this study set out to answer?

This research investigates night-to-night variability in polysomnography metrics and its influence on diagnosing obstructive sleep apnea.

June 18, 2026Open Access

Comprehensive Evaluation of Night-to-Night Variability in PSG Metrics and AHI–Based Diagnostic Reclassification

Key Result

The AHI 4% scoring definition resulted in higher overall classification disagreement compared to AHI 3%/arousal in short-interval comparisons (29.9% vs. 21.2%).

Key Points

This research investigates night-to-night variability in polysomnography metrics and its influence on diagnosing obstructive sleep apnea.
Retrospective analysis of a prospective study with 147 participants undergoing two PSGs within 10 days.
Quantification of night-to-night variability across 20 PSG-derived metrics using PCA and k-means clustering.
Comparison of diagnostic stability across different AHI definitions (3%/arousal vs. 4%) and hypoxic burden risk categories.
AHI 4% showed higher classification disagreement than AHI 3%/arousal in both short-interval (29.9% vs. 21.2%) and longitudinal comparisons (45.9% vs. 31.1%).
Hypoxic burden metrics demonstrated low inter-night disagreement (11.8%).
Calibration models aligned AHI 4% thresholds of 6.1-6.9 and 18.4-22.3 events/h with AHI 3%/arousal cutpoints of 15 and 30 events/h.

Study Design

Type

Observational (n=147)

Structured PICO

What is the night-to-night variability across PSG-derived metrics, and how do AHI scoring definitions influence diagnostic stability in OSA?

Population

147 participants with a prior diagnosis or high pretest likelihood of moderate-to-severe OSA who underwent two polysomnograms within 10 days.

Exposure

Two polysomnograms (PSGs) within 10 days

Outcome

Night-to-night variability (NtNV) across 20 PSG-derived metrics and diagnostic stability across AHI definitionssurrogate

AHI 4% criteria for OSA diagnosis show greater night-to-night classification disagreement compared to AHI 3%/arousal criteria, driven by threshold differences.

Main Result

Absolute Event Rate: 29.9% vs 21.2%

Abstract

BACKGROUND: Night-to-night variability (NtNV) in polysomnography (PSG) contributes to diagnostic uncertainty in obstructive sleep apnea (OSA), yet multi-metric evaluations using closely spaced PSG nights-particularly in moderate-to-severe disease-remain limited. The comparative stability of apnea-hypopnea index (AHI) definitions, hypoxic burden (HB), and threshold calibration remains unclear. RESEARCH QUESTION: What is the NtNV across PSG-derived metrics, and how do AHI scoring definitions and threshold calibration influence diagnostic stability in OSA? STUDY DESIGN AND METHODS: We performed a retrospective analysis of a prospective study including 147 participants with prior diagnosis or high pretest likelihood of moderate-to-severe OSA who underwent two PSGs within 10 days. NtNV was quantified across 20 PSG-derived metrics. A normalized NtNV matrix was analyzed using PCA followed by unsupervised k-means clustering to identify data-driven variability-pattern groups. Diagnostic stability was compared across AHI definitions (3%/arousal vs. 4%) and HB risk categories. Statistical calibration models derived AHI 4% thresholds aligned with AHI 3%/arousal severity cutpoints. RESULTS: , and HB were most stable. In the PCA followed by k-means analysis, respiratory event frequency metrics contributed most strongly and separated participants into lower- and higher-respiratory-variability pattern groups. AHI 4% showed higher classification disagreement than AHI 3%/arousal in short-interval (29.9% vs. 21.2% overall; 14.3% vs. 5.4% at the moderate-to-severe threshold) and longitudinal comparisons (45.9% vs. 31.1%; 20.9% vs. 8.2%). HB showed low inter-night disagreement (11.8%). Calibration models aligned AHI 4% thresholds of 6.1-6.9 and 18.4-22.3 events/h with AHI 3%/arousal cutpoints of 15 and 30 events/h. INTERPRETATION: Positional, autonomic, and sleep architecture metrics showed the highest NtNV; respiratory event frequency metrics were intermediate and oxygenation most stable. Greater classification disagreement with AHI 4% was threshold-driven, with implications for hypopnea scoring, and payer policy in OSA diagnosis.