Electronic health records (EHRs) contain large amounts of valuable clinical information, but a substantial portion of this information exists in unstructured form, including physician notes, discharge summaries, and narrative clinical reports. Because these data are recorded as free text, they are difficult to aggregate, standardize, and analyze using conventional statistical or database methods. As a result, a significant amount of clinically relevant information remains underutilized in healthcare analytics and decision support systems. This study proposes a hybrid framework that combines artificial intelligence–based natural language processing (NLP) with stochastic modeling to transform unstructured EHR narratives into structured clinical datasets. The approach first applies AI-driven NLP techniques to identify and extract clinically meaningful entities such as diagnoses, symptoms, medications, laboratory values, and procedures from free-text clinical notes. The extracted information is then organized into relational tables suitable for large-scale analytics. To account for patient heterogeneity and uncertainty in clinical observations, stochastic analytical methods are applied to reconstruct latent health trajectories and estimate time-dependent risk patterns. The proposed framework enables the integration of narrative clinical information into structured data environments, facilitating more comprehensive analysis of patient health dynamics. By combining AI-based text extraction with stochastic modeling, the method improves the ability to analyze complex clinical datasets and supports more realistic representation of disease progression and patient variability. This approach has the potential to enhance population health analytics, clinical research, and predictive modeling using electronic health record data.
Melo et al. (Thu,) studied this question.