Abstract Background Reproducible quantification of pulmonary edema on chest radiographs is challenging for bedside clinicians. The Radiographic Assessment of Lung Edema (RALE) score offers a structured, semi-quantitative approach with prognostic implications in acute respiratory failure (ARF), yet its clinical use is limited by the absence of standardized training. Objective To determine whether a standardized RALE training program, combined with expert feedback, improves inter-rater reliability across clinician experience levels. Methods We analyzed 4,487 chest radiographs from 864 critically ill patients with ARF enrolled in a prospective registry. Sixteen physicians (1 expert and 15 reviewers- including 5 interns, 4 residents, 3 fellows, and 3 attendings) completed a three-phase RALE training program: i) independent review of instructional materials, ii) a live virtual session led by the expert, and iii) iterative practice with expert feedback on discrepant scores. All scoring was performed using the Pulmo-Annotator platform. We assessed inter-rater reliability using intraclass correlation coefficients (ICC), Bland–Altman analysis, and Deming regression, while stratifying performance by experience level and self-reported confidence. Results Inter-rater reliability improved following the training program, with ICC increasing from 0.89 (95% CI: 0.85–0.92) to 0.93 (95% CI: 0.90–0.95) after expert feedback. Fellows demonstrated the largest gain of ICC from 0.89 to 0.97, while interns and attendings maintained high reliability. Post-feedback analyses indicated reduced systematic bias. Residents exhibited the greatest variability and revised fewer scores post-feedback, despite equivalent exposure to feedback. Score distributions also varied by self-reported confidence, with greater variability among less experienced reviewers. Conclusion Following a structured training program, radiographic edema scoring using the RALE framework demonstrated excellent inter-rater reliability, which further improved with targeted expert feedback. This structured training program, which incorporated standardized instructional materials and iterative feedback, demonstrated that RALE scoring is teachable across different levels of clinical experience and can be integrated into educational curricula to enhance consistency in chest radiograph interpretation. These findings provide a foundation for future work exploring the role of RALE training in broader research and clinical applications.
Qurashi et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: