An LLM-driven framework analyzing unstructured patient narratives reproduced quantitative findings with 91.3% concordance, 90% contextual correctness, and a 3% hallucination rate.
Cross-Sectional (n=1,642)
Yes
Can an LLM-driven framework reliably assess multidimensional cancer burden from unstructured patient narratives compared to structured quantitative ratings?
An LLM-driven framework can reliably extract and scale qualitative insights regarding cancer burden from unstructured patient narratives, achieving high concordance with structured quantitative assessments.
1592 Background: Measurement of cancer burden increasingly informs survivorship research, health system design, and policy. Existing approaches rely on structured patient-reported outcome measures, which are costly to administer, culturally constrained, and difficult to scale globally. Advances in artificial intelligence (AI), particularly large language models (LLMs), offer the opportunity to systematically analyze unstructured patient narratives at scale. The ResPECT pilot study evaluates whether an LLM-driven framework can assess multidimensional cancer burden from patient-reported free text. Methods: Adult cancer patients and survivors provided structured burden ratings (Likert scale, 1–10) and open-text narratives describing physical, psychological, social, financial, and other impacts of cancer and its treatment. Unstructured text was analyzed using a retrieval augmented generation (RAG) architecture combining dense semantic embeddings (Nomic Embed), an open-weight LLM (Mistral 7B), and an ensemble retriever integrating vector similarity search and BM25 retrieval. Performance was evaluated across three predefined domains: (1) analytical concordance with quantitative analyses across 23 hypotheses as well as (2) contextual correctness and (3) hallucination rate over 50 questions, assessed by two independent reviewers. Results: A total of 1,642 participants from 23 countries were analyzed; 66.9% identified as female and 22.0% identified themselves to be part of an underrepresented group. Overall, impact on psychological well-being was ranked significantly higher compared to physical, social, or financial well-being (all p≤0.001). Younger participants and individuals identifying with underrepresented groups reported significantly higher overall and domain-specific burden, while U.S. participants reported greater financial burden than UK participants (p≤0.001). The LLM-based analyses reproduced quantitative findings in 21 of 23 hypotheses (91.3% concordance). Contextual correctness was 90% (90 of 100), and hallucinations occurred in 3% (3 of 100) of generated responses, predominantly involving minor paraphrasing of patient quotations. The LLM consistently identified key themes, particularly related to psychological well-being, including emotional stress, depression, relationship strain, isolation, guilt, and hope. Conclusions: The ResPECT initiative demonstrates that an LLM-driven framework can reliably provide and effectively scale qualitative insights into cancer burden from a patient perspective. While we acknowledge that the current pilot is not representative of cancer patients worldwide, the ResPECT approach may be scalable to derive comparable and actionable insights to inform investment in cancer services around the world.
Pfob et al. (Wed,) conducted a cross-sectional in Cancer (n=1,642). LLM-driven framework (RAG architecture with Mistral 7B) vs. Quantitative analyses was evaluated on Analytical concordance with quantitative analyses across 23 hypotheses. An LLM-driven framework analyzing unstructured patient narratives reproduced quantitative findings with 91.3% concordance, 90% contextual correctness, and a 3% hallucination rate.