March 3, 2026Open Access

Grading Diabetic Retinopathy Using Comparative Assessment: A Pilot Study Comparing Paired Image Comparisons With Direct Grading

Key Points

Comparative assessment offers higher accuracy and improved specificity for diabetic retinopathy grading, along with consistent outcomes across attempts.
In a sample of 90 retinal images, the comparative method outperformed direct grading in both accuracy and repeatability measures for junior graders.
Evaluation employed confusion matrices and McNemar's test to analyze classification performance between grading approaches.
Findings indicate that relative severity judgement may help reduce grading variability in clinical settings with less experienced graders.

Abstract

Introduction Accurate grading of diabetic retinopathy is essential for effective screening, clinical decision-making, and evaluation of automated diagnostic systems. Conventional grading relies on categorical severity scales, which are subject to inter- and intra-observer variability, particularly among less-experienced or junior graders and in cases with subtle disease features. Comparative assessment using paired image comparisons may offer a complementary approach by reframing grading as a relative severity judgement and potentially reducing grading variability. Methods This pilot study evaluated retinal fundus photographs obtained from a publicly available dataset. Ninety images spanning the spectrum of diabetic retinopathy severity were graded using two approaches: direct grading according to the International Clinical Diabetic Retinopathy Severity Scale and comparative assessment using paired image comparisons. Both methods were performed twice by a junior clinician following structured training to assess repeatability. Classification performance for discrimination between the presence and absence of diabetic retinopathy was compared using confusion matrices and McNemar's test. Results Comparative assessment demonstrated higher overall accuracy and improved specificity compared with direct grading across repeated grading rounds, while maintaining high sensitivity. Paired image comparison showed greater consistency between grading attempts, whereas direct grading exhibited greater variability. Differences in classification performance between methods were statistically significant. Conclusion In this pilot study, comparative assessment using paired image comparisons outperformed conventional direct grading for discrimination between the presence and absence of diabetic retinopathy when applied by a junior grader. These findings suggest that relative severity judgement may represent a viable alternative or adjunct to traditional categorical grading systems, particularly in contexts where grading variability is a concern. Larger studies involving multiple graders and real-world screening images are required to validate these findings and define the clinical role of comparative assessment.

Bookmark

View Full Paper

Bookmark

View Full Paper

Grading Diabetic Retinopathy Using Comparative Assessment: A Pilot Study Comparing Paired Image Comparisons With Direct Grading

Key Points

Abstract

Cite This Study