This study develops and evaluates a retrieval-augmented generation (RAG) system for rare disease diagnosis, combining structured knowledge retrieval from Orphanet (4,293 diseases) and PubMed case reports (1,832 chunks) with LLM reasoning. The system is evaluated on 85 clinical vignettes including 70 ultra-rare disease cases. Key Findings: RAG achieved 54.3% top-1 diagnostic accuracy on ultra-rare diseases vs 38.6% for LLM-only (p=0.001) Absolute improvement of +15.7 percentage points; number needed to diagnose = 6.4 RAG was never inferior to the LLM-only baseline across all 70 ultra-rare cases Retrieval failure, not generation failure, was the primary bottleneck (42.5% of missed cases) HPO phenotype matching was the most valuable retrieval component; cross-encoder reranking was counterproductive LLM confidence calibration was poor — 93% of predictions labelled high confidence regardless of correctness Implications: Coupling structured biomedical knowledge retrieval with LLM reasoning yields statistically significant improvements in rare disease diagnosis. Retrieval quality — not model capability — is the binding constraint. These findings support development of knowledge-augmented diagnostic tools for the ~300 million people globally affected by rare diseases. Contents: Main manuscript (PDF and DOCX), supplementary information (PDF with additional tables and per-case results), and high-resolution main figures (5 PNG).
Building similarity graph...
Analyzing shared references across papers
Loading...
Hayden Farquhar
Building similarity graph...
Analyzing shared references across papers
Loading...
Hayden Farquhar (Thu,) studied this question.
www.synapsesocial.com/papers/69d9e63478050d08c1b76885 — DOI: https://doi.org/10.5281/zenodo.19477876
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: