What question did this study set out to answer?

The study aims to evaluate the cost and accuracy of various language models for annotating biomedical literature.

March 12, 2026Open Access

Towards Sustainable Curation: Evaluation of Cost and Accuracy of AI Tools in Scaling Annotation Tasks in Curation of Biomedical Literature

Key Points

The study aims to evaluate the cost and accuracy of various language models for annotating biomedical literature.
Compared four language models: GPT 4, Llama 3, Gemma 2, and Mixtral 8x7b.
Assessed performance in population group curation tasks.
Evaluated cost-effectiveness in annotation processes.
Identified variations in performance among the four language models.
Highlighted potential for cost-saving strategies in annotation tasks.
Suggested that AI tools could enhance sustainable practices in biomedical curation.

Abstract

Here we compare the performance and cost of four language models (GPT 4, Llama 3, Gemma 2 and Mixtral 8x7b) in the lightweight task of population group curation. Our findings provide insight into potential sustainable curation practices in the presence of limited resources.

Towards Sustainable Curation: Evaluation of Cost and Accuracy of AI Tools in Scaling Annotation Tasks in Curation of Biomedical Literature

Key Points

Abstract

Cite This Study