Locally run transformer-based language models extracted structured data from unstructured echocardiography and cardiac catheterization reports with mean accuracies of 95.7% and 94.9%, respectively.
Do fine-tuned transformer-based language models accurately extract structured data from unstructured echocardiography and cardiac catheterization reports?
Locally run, fine-tuned transformer-based language models can accurately extract structured data from unstructured echocardiography and cardiac catheterization reports, offering a privacy-preserving alternative to external APIs.
Absolute Event Rate: 0% vs 0%
Abstract Objectives Echocardiography and cardiac catheterization reports capture important clinical assessment information of cardiac function and disease severity. This study explores using open-source transformer-based language models (LMs) that are run locally within an institutional environment as a privacy-preserving alternative to external API-based large LM to systematically extract clinical data from unstructured echocardiography and cardiac catheterization reports, aiming to improve data accessibility for research and patient care. Materials and Methods Two transformer-based LMs, BioclinicalBERT and BART-Large-CNN, were fine-tuned in a secure local environment using a question-answering approach. The dataset included 3286 echocardiography and 1884 cardiac catheterization reports from Kaiser Permanente Southern California’s electronic health records, annotated for 25 and 47 predefined categories, respectively. Three hundred reports from each type were randomly selected and used for validation, with the remainder for training. Model performance was assessed using accuracy, precision, recall, and F1-score at 2 probability thresholds. The effect of training set size on model performance was also evaluated. Results Both models achieved consistent and high accuracy, precision, and recall (all 90%) across the 5 seed runs for both report types. For echocardiography, BioclinicalBERT reached mean accuracy of 95.7%, precision of 97.6%, recall of 97.4%, and F1-score of 0.98 at the probability threshold of 0.1; BART-Large-CNN had similar results. For cardiac catheterization, BART-Large-CNN slightly outperformed BioclinicalBERT with mean accuracy 94.9% vs 94.3%; precision 96.7% vs 96.3%; recall 96.1% vs 95.7%, and F1-score 0.96 vs 0.96 at the probability threshold of 0.1. Most individual categories showed strong performance, though a few (eg, prosthetic mitral valve, right atrial pressure) had lower scores. Performance improved with more training data, but plateauing around 1000 reports. Discussion and conclusion Fine-tuned transformer-based LMs can effectively extract structured data from unstructured cardiac reports, supporting automated information extraction to enhance research and clinical applications.
Xie et al. (Fri,) reported a other. Locally run transformer-based language models extracted structured data from unstructured echocardiography and cardiac catheterization reports with mean accuracies of 95.7% and 94.9%, respectively.