Key points are not available for this paper at this time.
Background: The rapidly accumulating scientific literature in HIV presents a significant challenge in accurately and efficiently assessing the relevant literature. This study explores the potential capabilities of using large language models (LLMs), such as ChatGPT, for selecting relevant studies for a systematic review. Method: Scientific papers were initially obtained from bibliographic database searches using a Boolean search strategy with pre-defined keywords. From 15, 839 unique records, three reviewers manually identified 39 relevant papers based on pre-specified inclusion and exclusion criteria. In the ChatGPT experiment, over 10% of records were randomly chosen as the experimental dataset, including the 39 manually identified manuscripts. These unique records (n=1, 680) underwent screening via ChatGPT-4 using the same pre-specified criteria. Four strategies were employed including standard prompting, i. e. , input-output (IO), chain of thought with zero-shot learning (0-CoT), CoT with few-shot learning (FS-CoT), and Majority Voting (which integrates all three promoting strategies). Performance of the models were assessed using recall, F-score, and precision measures. Results: Recall scores (% of true abstracts successfully identified and retrieved by the model from all input data/records) for different ChatGPT configurations were 0. 82 (IO), 0. 97 (0-CoT), and both the FS-CoT and the Majority Voting prompts achieved a recall score of 1. 0. F-scores were 0. 34 (IO), 0. 29 (0-CoT), 0. 39 (FS-CoT), and 0. 46 (majority voting). Precision measures were 0. 22 (IO), 0. 17 (0-CoT), 0. 24 (FS-CoT), and 0. 30 (Majority Voting). Computational time varied with 2. 32, 4. 55, 6. 44, and 13. 30 hours for IO, 0-CoT, FS-CoT, and majority voting, respectively. Processing costs for the 1, 680 unique records were approximately 63, 73, 186, and 325, respectively. Conclusion: LLMs, like ChatGPT, are viable for systematic reviews, efficiently identifying studies meeting pre-specified criteria. Greater efficacy was observed when a more sophisticated prompt design was employed, integrating IO, 0-CoT and FS-CoT prompt techniques (i. e. , majority voting). LLMs can expedite the study selection process in systematic reviews compared to manual methods, with minimal cost implications.
Building similarity graph...
Analyzing shared references across papers
Loading...
M Naser Lessani
Zhenlong Li
Shan Qiao
Building similarity graph...
Analyzing shared references across papers
Loading...
Lessani et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68e580c7b6db64358751e05b — DOI: https://doi.org/10.1101/2024.09.18.24313828
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: