Introduction Accurate grading of diabetic retinopathy is essential for effective screening, clinical decision-making, and evaluation of automated diagnostic systems. Conventional grading relies on categorical severity scales, which are subject to inter- and intra-observer variability, particularly among less-experienced or junior graders and in cases with subtle disease features. Comparative assessment using paired image comparisons may offer a complementary approach by reframing grading as a relative severity judgement and potentially reducing grading variability. Methods This pilot study evaluated retinal fundus photographs obtained from a publicly available dataset. Ninety images spanning the spectrum of diabetic retinopathy severity were graded using two approaches: direct grading according to the International Clinical Diabetic Retinopathy Severity Scale and comparative assessment using paired image comparisons. Both methods were performed twice by a junior clinician following structured training to assess repeatability. Classification performance for discrimination between the presence and absence of diabetic retinopathy was compared using confusion matrices and McNemar's test. Results Comparative assessment demonstrated higher overall accuracy and improved specificity compared with direct grading across repeated grading rounds, while maintaining high sensitivity. Paired image comparison showed greater consistency between grading attempts, whereas direct grading exhibited greater variability. Differences in classification performance between methods were statistically significant. Conclusion In this pilot study, comparative assessment using paired image comparisons outperformed conventional direct grading for discrimination between the presence and absence of diabetic retinopathy when applied by a junior grader. These findings suggest that relative severity judgement may represent a viable alternative or adjunct to traditional categorical grading systems, particularly in contexts where grading variability is a concern. Larger studies involving multiple graders and real-world screening images are required to validate these findings and define the clinical role of comparative assessment.
Mohammed Al-Roubaie (Thu,) studied this question.