What question did this study set out to answer?

The research aims to evaluate if a standardized training program enhances inter-rater reliability for scoring pulmonary edema among clinicians.

January 25, 2026Open Access

Improving Inter-Rater Reliability in Radiographic Edema Scoring in Acute Respiratory Failure Through Structured Training and Expert Feedback

Puntos clave

The research aims to evaluate if a standardized training program enhances inter-rater reliability for scoring pulmonary edema among clinicians.
Analyzed 4,487 chest radiographs from 864 patients with acute respiratory failure.
Involved 16 physicians participating in a three-phase RALE training program.
Utilized the Pulmo-Annotator platform for scoring and assessed inter-rater reliability using statistical methods.
Inter-rater reliability increased from ICC 0.89 to 0.93 post-training with expert feedback.
Fellows showed the greatest improvement in ICC from 0.89 to 0.97.
Analyses indicated reduced systematic bias and variability in scoring among less experienced reviewers.

Resumen

Abstract Background Reproducible quantification of pulmonary edema on chest radiographs is challenging for bedside clinicians. The Radiographic Assessment of Lung Edema (RALE) score offers a structured, semi-quantitative approach with prognostic implications in acute respiratory failure (ARF), yet its clinical use is limited by the absence of standardized training. Objective To determine whether a standardized RALE training program, combined with expert feedback, improves inter-rater reliability across clinician experience levels. Methods We analyzed 4,487 chest radiographs from 864 critically ill patients with ARF enrolled in a prospective registry. Sixteen physicians (1 expert and 15 reviewers- including 5 interns, 4 residents, 3 fellows, and 3 attendings) completed a three-phase RALE training program: i) independent review of instructional materials, ii) a live virtual session led by the expert, and iii) iterative practice with expert feedback on discrepant scores. All scoring was performed using the Pulmo-Annotator platform. We assessed inter-rater reliability using intraclass correlation coefficients (ICC), Bland–Altman analysis, and Deming regression, while stratifying performance by experience level and self-reported confidence. Results Inter-rater reliability improved following the training program, with ICC increasing from 0.89 (95% CI: 0.85–0.92) to 0.93 (95% CI: 0.90–0.95) after expert feedback. Fellows demonstrated the largest gain of ICC from 0.89 to 0.97, while interns and attendings maintained high reliability. Post-feedback analyses indicated reduced systematic bias. Residents exhibited the greatest variability and revised fewer scores post-feedback, despite equivalent exposure to feedback. Score distributions also varied by self-reported confidence, with greater variability among less experienced reviewers. Conclusion Following a structured training program, radiographic edema scoring using the RALE framework demonstrated excellent inter-rater reliability, which further improved with targeted expert feedback. This structured training program, which incorporated standardized instructional materials and iterative feedback, demonstrated that RALE scoring is teachable across different levels of clinical experience and can be integrated into educational curricula to enhance consistency in chest radiograph interpretation. These findings provide a foundation for future work exploring the role of RALE training in broader research and clinical applications.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo