What question did this study set out to answer?

March 15, 2026Open Access

Characterization and comparison of structured and unstructured electronic health record data mapped to MedDRA for post-marketing surveillance

Key Points

This research aims to compare structured and unstructured electronic health record data for post-marketing surveillance.
Mapped EHR data to MedDRA using MetaMap.
Analyzed 15,000 encounters from various clinical settings.
Extracted MedDRA concepts from clinical notes via natural language processing.
Evaluated corroboration between structured ICD-10-CM diagnoses and unstructured data.
Explored similarities and differences in the MedDRA hierarchy.
Processed 119,492 clinical notes and mapped 163,254 ICD-10-CM codes to MedDRA.
Found 73–98% overlap of MedDRA terms in structured and unstructured data.
80–95% of MedDRA concepts in unstructured text were not present in ICD-10-CM codes.

Abstract

Abstract Objectives Medical product safety surveillance efforts, whether using electronic health record (EHR) or claims data, typically rely on structured codes. Utilizing unstructured EHR data, particularly information extracted from clinical text through natural language processing (NLP), enriches information available for data mining, phenotyping, and surveillance. To assess overlapping and distinct information across structured and unstructured EHR data, we mapped both to a common vocabulary (Medical Dictionary for Regulatory Activities, MedDRA). We assess the feasibility of implementing such a mapping and explored similarities and differences at multiple levels of the concept hierarchy. Materials and Methods We randomly sampled 15,000 encounters (5000 each from ambulatory, emergency, and inpatient encounters). For each encounter, we extracted MedDRA concepts from clinical notes using MetaMap and mapped structured ICD-10-CM diagnoses to MedDRA. We evaluated corroboration between data sources across the MedDRA hierarchy, as well as the unique information contributed by each source. Results We processed 119,492 clinical notes and mapped 163,254 ICD-10-CM codes to MedDRA. Most encounters (73–98%) had some overlap between MedDRA preferred terms identified from structured and unstructured data. Among MedDRA concepts found in unstructured text, 80–95% were not found in the encounter’s associated ICD-10-CM coded data. Discussion and Conclusion While MedDRA concepts from structured data were mostly corroborated by those extracted from unstructured clinical text, the majority of MedDRA concepts recognized in each encounter were only mentioned in text. Leveraging MedDRA-encoded unstructured text can provide a more comprehensive clinical picture of patients and complement the structured data traditionally used in epidemiological and pharmacovigilance studies.

Characterization and comparison of structured and unstructured electronic health record data mapped to MedDRA for post-marketing surveillance

Key Points

Abstract

Cite This Study