Fungi exhibit diverse trophic strategies, ranging from obligate symbiosis to saprotrophy, with some taxa capable of occupying multiple ecological roles. Manually identifying trophic versatility from literature is time-consuming and difficult to scale. Here, we present a pilot workflow that automates the classification of fungal trophic modes using transformer-based language models. A curated dataset of 56 fungal ecology abstracts was manually labelled as dual (occupying multiple trophic modes) or solo (restricted to one mode) and used to fine-tune four models: BioBERT, BERT-base-cased, BERT-base-uncased and BiodivBERT. Stratified 5-fold cross-validation revealed that BioBERT and BERT-base-cased performed equally well (~ 89% accuracy, balanced precision and recall), highlighting the importance of case sensitivity in taxonomic text. BiodivBERT and uncased BERT models underperformed, indicating that domain adaptation alone is not sufficient. This pilot study emphasises reproducibility, transparency and open data integration, offering a generalisable proof-of-concept for linking literature-derived ecological information to existing fungal trait databases such as FUNGuild and FungalTraits. All code and data are openly available to support reuse and scaling to larger datasets.
Beatrice Bock (Wed,) studied this question.