What question did this study set out to answer?

The study aimed to evaluate whether AI can effectively replace the first reader in detecting lung cancer on chest radiographs.

March 16, 2026Open Access

Can AI substitute the first reader in chest radiograph screening? A retrospective non-inferiority evaluation

Key Points

The study aimed to evaluate whether AI can effectively replace the first reader in detecting lung cancer on chest radiographs.
Conducted a retrospective evaluation of 155,503 participants with 320,329 screenings.
Analyzed detection performance of AI models in identifying suspected lung cancer against first readers.
Evaluated localization accuracy and compared detection rates using McNemar’s test.
AI detection rates ranged from 62.5% to 77.3%, exceeding first readers' 59.3%.
In nodule/mass analysis, AI detection ranged from 64.5% to 76.5%, also higher than the 59.2% from first readers.
False-positive rates were notably higher for AI models compared to first readers (0.081 to 0.147 vs. 0.002).

Abstract

To evaluate whether AI can substitute for the first reader in a double-reading workflow for lung-cancer detection on screening chest radiographs. A retrospective analysis was conducted in a screening cohort at Ishikawa Health Service Association that included 155,503 participants undergoing 320,329 examinations between January 2018 and September 2020. From examinations initially identified as suspected lung cancer by the conventional double-reading system (n = 2,882), prespecified exclusions were applied, yielding 1,847 examinations for detection-performance analysis. AI-based lesion detection was retrospectively performed using three AI models, and the localization accuracy of the AI outputs was evaluated. Detection performance (AI vs. first readers) was compared using McNemar’s test with a non-inferiority margin of − 0.05 (AI deemed non-inferior if the lower bound of the 95% CI exceeded − 0.05) in two settings: (1) all lesions and (2) pulmonary nodule/mass only. The false-positive rate per examination was estimated using 5,784 normal examinations (5,689 participants) performed between January and June 2018 with ≥ 2-year negative follow-up. For all abnormalities, each AI model met the non-inferiority criterion relative to first readers and showed higher detection rates (AI detection, 62.5–77.3%; first readers, 59.3%). Similar findings were observed when the analysis was limited to nodule/mass only (AI, 64.5–76.5%; first readers, 59.2%). False-positive frequencies per examination were 0.081 (Software A), 0.065 (Software B), and 0.147 (Software C), versus 0.002 for first readers. In a retrospective screening cohort, three AI models achieved non-inferior, overall higher detection performance compared with first readers for suspected lung cancer on chest radiographs. Despite higher false-positive rates, AI could feasibly assume the first-reader role within a conventional double-reading workflow while maintaining diagnostic quality. Prospective, multi-center studies are warranted to confirm effectiveness, quantify workflow impact, and assess downstream consequences of AI-assisted single reading.

Bookmark

View Full Paper

Bookmark

View Full Paper

Can AI substitute the first reader in chest radiograph screening? A retrospective non-inferiority evaluation

Key Points

Abstract

Cite This Study