Key points are not available for this paper at this time.
Generative artificial intelligence (GenAI) in general, and large language models (LLMs) in particular, are highly fashionable. As they have the ability to generate coherent output based on prompts in natural language, they are promoted as tools to free knowledge workers from tedious tasks such as content writing, customer support and routine computer code generation. Unsurprisingly, their application is also attractive to professionals in the research domain, where mundane and laborious tasks, such as literature screening, are commonplace. We evaluate Vertex AI ‘text-bison’, a foundational LLM model, in a real-world academic scenario by replicating parts of a popular systematic review in the information management domain. By comparing the results of a zero-shot LLM-based approach with those of the original study, we gather evidence on the suitability of state-of-the-art general-purpose LLMs for the analysis of scientific content. We show that the LLM-based approach delivers good scientific content analysis performance for a general classification problem (ACC =0.9), acceptable performance for a domain-specific classification problem (ACC =0.8) and borderline performance for a text comprehension problem (ACC ≈0.69). We conclude that some content analysis tasks with moderate accuracy requirements may be supported by current LLMs. As the technology will evolve rapidly in the foreseeable future, studies on large corpora, where some inaccuracies are tolerable, or workflows that prepare large data sets for human processing, may increasingly benefit from the capabilities of GenAI.
Platt et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: