Key points are not available for this paper at this time.
Background The study of disease progression relies on clinical data, including text data, and extracting valuable features from text data has been a research hot spot. With the rise of large language models (LLMs), semantic-based extraction pipelines are gaining acceptance in clinical research. However, the security and feature hallucination issues of LLMs require further attention. Objective This study aimed to introduce a novel modular LLM pipeline, which could semantically extract features from textual patient admission records. Methods The pipeline was designed to process a systematic succession of concept extraction, aggregation, question generation, corpus extraction, and question-and-answer scale extraction, which was tested via 2 low-parameter LLMs: Qwen-14B-Chat (QWEN) and Baichuan2-13B-Chat (BAICHUAN). A data set of 25,709 pregnancy cases from the People’s Hospital of Guangxi Zhuang Autonomous Region, China, was used for evaluation with the help of a local expert’s annotation. The pipeline was evaluated with the metrics of accuracy and precision, null ratio, and time consumption. Additionally, we evaluated its performance via a quantified version of Qwen-14B-Chat on a consumer-grade GPU. Results The pipeline demonstrates a high level of precision in feature extraction, as evidenced by the accuracy and precision results of Qwen-14B-Chat (95.52% and 92.93%, respectively) and Baichuan2-13B-Chat (95.86% and 90.08%, respectively). Furthermore, the pipeline exhibited low null ratios and variable time consumption. The INT4-quantified version of QWEN delivered an enhanced performance with 97.28% accuracy and a 0% null ratio. Conclusions The pipeline exhibited consistent performance across different LLMs and efficiently extracted clinical features from textual data. It also showed reliable performance on consumer-grade hardware. This approach offers a viable and effective solution for mining clinical research data from textual records.
Building similarity graph...
Analyzing shared references across papers
Loading...
Lei Wang
Yinyao Ma
Wenshuai Bi
Journal of Medical Internet Research
BGI Group (China)
Center for Life Sciences
The People's Hospital of Guangxi Zhuang Autonomous Region
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Wed,) studied this question.
www.synapsesocial.com/papers/68e792b5b6db6435877037a2 — DOI: https://doi.org/10.2196/54580