Abstract Background Drug-induced pulmonary arterial hypertension (DIPAH) represents a critical subset of PAH cases, caused by exposure to drugs with pulmonary vascular toxicity. Identifying drugs potentially associated with PAH in the ‘real-world’ remains challenging due to the complexity of the PAH landscape. Comorbidities and comedications can modify the course of PAH in patients at risk due to prior drug exposure. We integrated real-world data—from pharmacovigilance and Electronic Health Records (EHR) —with Large Language Models (LLMs) to systematically identify drug associations in patients at risk for DIPAH. Methods Our study utilized structured pharmacovigilance reports from the FDA FAERS. A case cohort of patients ‘at risk’ for PAH (n = 2, 493) was defined by exposure to established PAH-associated drugs, including dasatinib, methamphetamine, and various chemotherapeutic agents. We excluded individuals with a history of PAH or PAH-associated conditions. A 1: 3 propensity score match yielded a final dataset of 9, 969 matched records (2, 493 cases, 7, 476 controls). LLM training Patient-level contextual data were tokenized for LLM processing, resulting in 9, 583 input sequences (parameters: maxₛeqₗength=512, learningᵣate=2e-5). For causal inference, we integrated the classification probabilities derived from our LLM with conventional structured features. Predictive associations were subsequently validated using an external Electronic Health Record (EHR) source, replicating the study’s inclusion and exclusion criteria. Results The LLM demonstrated robust performance in classifying pulmonary arterial hypertension (PAH) risk, achieving an area under the curve (AUC) of 0. 97, accuracy of 0. 95, precision of 0. 91, and recall of 0. 86. Our model identified several drugs strongly associated with PAH, including carfilzomib (z-score: 8. 5), bevacizumab (5. 33), trastuzumab (5. 28), and ruxolitinib (4. 49). Subsequent validation using EHRs confirmed these associations, revealing significantly higher PAH incidence in patients treated with these drugs: carfilzomib (7. 38%; OR: 1. 76, p 0. 0001), bevacizumab (3. 34%; OR: 1. 21, p 0. 0001), trastuzumab (2. 31%; OR: 1. 21, p = 0. 0018), and ruxolitinib (1. 62%; OR: 1. 36, p = 0. 0052). The LLM also identified five drugs with established therapeutic roles in PAH: furosemide (z-score: 4. 25), rituximab (3. 54), carvedilol (2. 58), nifedipine (2. 14), and amlodipine (1. 81), underscoring its predictive utility in PAH. Conclusion Our integrative framework, combining LLMs with causal inference, successfully identified both known and emerging drug associations for PAH. This framework demonstrates the power of merging large-scale pharmacovigilance and EHR data with deep contextual modeling to navigate ‘real world’ complexity and pinpoint key drug associations for PAH. This abstract is funded by: Cincinnati Children’s Hospital Medical Center’s Trustee Award (PI: Sarangdhar)
Kolli et al. (Fri,) studied this question.