Software is central to modern science, yet references to it in scholarly articles are often incomplete or inconsistent, hindering reproducibility and reuse. Existing methods for mining software mentions, such as rule-based and conventional NLP approaches, remain limited in scalability and robustness. The emergence of large language models (LLMs) offers new opportunities for improving this task. LLMs exhibit strong contextual reasoning and adaptability, making them well suited to extracting software mentions from heterogeneous academic texts. In this paper, we evaluate several LLM-based approaches using three gold-standard corpora, comparing prompting strategies and configurations against established baselines. Our contributions are threefold: (1) we provide the first systematic evaluation of LLMs for software mention extraction, (2) we analyse their strengths and weaknesses relative to prior techniques, and (3) we discuss implications for reproducibility and open science. Results show that LLMs significantly improve extraction accuracy and adaptability, advancing efforts to integrate software into the scholarly record.
Building similarity graph...
Analyzing shared references across papers
Loading...
David Pride
The Open University
Matteo Guenci
Martin Dočekal
Brno University of Technology
University of Bologna
The Open University
Brno University of Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Pride et al. (Mon,) studied this question.
synapsesocial.com/papers/69a765d1badf0bb9e87da94d — DOI: https://doi.org/10.1109/jcdl67857.2025.00041