What question did this study set out to answer?

To evaluate the effectiveness of a retrieval-augmented generation system for diagnosing ultra-rare diseases.

April 11, 2026Open Access

Retrieval-augmented generation improves diagnostic accuracy for ultra-rare diseases: a multi-source knowledge system evaluated on 85 clinical vignettes

Key Points

To evaluate the effectiveness of a retrieval-augmented generation system for diagnosing ultra-rare diseases.
Developed a system combining structured knowledge retrieval from Orphanet and PubMed with LLM reasoning.
Evaluated the system on 85 clinical vignettes, including 70 ultra-rare disease cases.
Compared diagnostic accuracy of RAG against LLM-only responses.
RAG achieved a top-1 diagnostic accuracy of 54.3% for ultra-rare diseases compared to 38.6% for LLM-only (p=0.001).
An absolute improvement of 15.7 percentage points was noted.
Retrieval failure was identified as the main limitation, accounting for 42.5% of missed diagnoses.
HPO phenotype matching proved to be the most effective component of retrieval.

Abstract

This study develops and evaluates a retrieval-augmented generation (RAG) system for rare disease diagnosis, combining structured knowledge retrieval from Orphanet (4,293 diseases) and PubMed case reports (1,832 chunks) with LLM reasoning. The system is evaluated on 85 clinical vignettes including 70 ultra-rare disease cases. Key Findings: RAG achieved 54.3% top-1 diagnostic accuracy on ultra-rare diseases vs 38.6% for LLM-only (p=0.001) Absolute improvement of +15.7 percentage points; number needed to diagnose = 6.4 RAG was never inferior to the LLM-only baseline across all 70 ultra-rare cases Retrieval failure, not generation failure, was the primary bottleneck (42.5% of missed cases) HPO phenotype matching was the most valuable retrieval component; cross-encoder reranking was counterproductive LLM confidence calibration was poor — 93% of predictions labelled high confidence regardless of correctness Implications: Coupling structured biomedical knowledge retrieval with LLM reasoning yields statistically significant improvements in rare disease diagnosis. Retrieval quality — not model capability — is the binding constraint. These findings support development of knowledge-augmented diagnostic tools for the ~300 million people globally affected by rare diseases. Contents: Main manuscript (PDF and DOCX), supplementary information (PDF with additional tables and per-case results), and high-resolution main figures (5 PNG).

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Hayden Farquhar

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Retrieval-augmented generation improves diagnostic accuracy for ultra-rare diseases: a multi-source knowledge system evaluated on 85 clinical vignettes

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider