What question did this study set out to answer?

The aim is to evaluate how LLMs can automate the re-annotation of microbiome sequencing metadata for better usability.

February 14, 2026Open Access

Enhanced semantic classification of microbiome sample origins using Large Language Models (LLMs)

Key Points

The aim is to evaluate how LLMs can automate the re-annotation of microbiome sequencing metadata for better usability.
Utilized OpenAI GPT models for annotation without fine-tuning
Assessed 1,000 hand-curated examples for performance
Compared proprietary and open-weight LLMs for ecological classification
Applied the optimized pipeline to 2 million sequencing records from the environment
Annotation performance exceeded a baseline manual keyword-based approach
Open-weight models showed comparable accuracy to proprietary ones
Coarse-grained standardized annotations were achieved for global sample origins

Abstract

Abstract Over the past decade, central sequence repositories have expanded significantly in size. This vast accumulation of data holds value and enables further studies, provided that the data entries are well annotated. However, the submitter-provided metadata of sequencing records can be of heterogeneous quality, presenting significant challenges for re-use. Here, we test to what extent large language models (LLMs) can be used to cost-effectively automate the re-annotation of sequencing records against a simplified classification scheme of broad ecological environments with relevance to microbiome studies, without fine-tuning. This effort directly contributes to improving the FAIRness—Findability, Accessibility, Interoperability, and Reusability—of microbiome sequencing metadata, thereby enhancing their “AI readiness” for downstream computational analyses. We focused on sequencing samples taken from the environment, for which metadata is important. We employed OpenAI Generative Pre-trained Transformer (GPT) models, and assessed scalability, time- and cost-effectiveness, as well as performance against a diverse, hand-curated benchmark with 1,000 examples, that span a wide range of complexity in metadata interpretation. Annotation performance markedly outperformed that of a baseline, manually curated, non-ML keyword-based approach. Changing models (or model parameters) has only minor effects on performance, but prompts need to be carefully designed to match the task. Furthermore, when we compared proprietary OpenAI models with open-weight alternatives (e.g., Qwen, meta-Llama, and microsoft-phi-4), we found comparable accuracy for both biome and sub-biome classification, indicating that open-weight architectures can match the performance of proprietary models for large-scale ecological metadata re-annotation. We validated the pipeline with 1,000 hand-curated samples, and we applied the optimized pipeline to 2 million sequencing records from the environment, providing coarse-grained yet standardized sample origin annotations covering the globe. Our work demonstrates the effective use of LLMs to simplify and standardize annotation from complex biological metadata.

Read Full Paperexternally

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper