Systematic reviews involve time-intensive processes of screening titles, abstracts, and full texts to identify relevant studies. This study evaluates the potential of large language models (LLMs) to automate citation screening across three datasets with varying inclusion rates. Six LLMs were tested using zero- to five-shot in context-learning, with demonstration selection using PubMedBERT for semantic similarity. Majority voting and ensemble learning were applied to enhance performance. Results showed that no single LLM consistently excelled across the datasets, with sensitivity and specificity influenced by inclusion rates. Overall, ensemble learning and majority voting improved performance in citation screening.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhihong Zhang
M. Nezhad
Pallavi Gupta
Columbia University
Columbia University Irving Medical Center
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang et al. (Thu,) studied this question.
www.synapsesocial.com/papers/689dfe97d61984b91e13c02a — DOI: https://doi.org/10.3233/shti251264