September 17, 2024Open Access

HunFlair2 in a cross-corpus evaluation of biomedical named entity recognition and normalization tools

MSMario SängerHumboldt-Universität zu Berlin SGSamuele GardaHumboldt-Universität zu Berlin XWXing David WangHumboldt-Universität zu Berlin

Key Points

Key points are not available for this paper at this time.

Abstract

With the exponential growth of the life sciences literature, biomedical text mining (BTM) has become an essential technology for accelerating the extraction of insights from publications. The identification of entities in texts, such as diseases or genes, and their normalization, i.e. grounding them in knowledge base, are crucial steps in any BTM pipeline to enable information aggregation from multiple documents. However, tools for these two steps are rarely applied in the same context in which they were developed. Instead, they are applied "in the wild", i.e. on application-dependent text collections from moderately to extremely different from those used for training, varying e.g. in focus, genre or text type. This raises the question whether the reported performance, usually obtained by training and evaluating on different partitions of the same corpus, can be trusted for downstream applications.

AI에게 질문

Bookmark

View Full Paper

Cite This Study

Sänger et al. (Tue,) studied this question.

synapsesocial.com/papers/68e5832cb6db6435875203dc https://doi.org/https://doi.org/10.1093/bioinformatics/btae564

AI에게 질문

Bookmark

View Full Paper