What question did this study set out to answer?

The research aims to develop a unified model that optimizes training across multiple datasets for Biomedical Named Entity Recognition (BioNER).

May 21, 2026Open Access

Loss masking-based gradient optimisation: A new approach for training supervised biomedical named entity recognition models using multi-dataset

Key Points

The research aims to develop a unified model that optimizes training across multiple datasets for Biomedical Named Entity Recognition (BioNER).
Proposed Loss-Masking Optimisation framework for BioNER using a dataset-aware masking strategy.
Extended a standard BERT-based NER pipeline with a tag-masking array to reduce cross-dataset interference.
Trained a single BioNER model on 16 biomedical NER datasets to evaluate performance across them.
Achieved higher precision and overall F1 scores compared to conventional multi-dataset training methods.
Some datasets saw performance gains while others remained at baseline or declined.
Highlights the complex effects of dataset interactions on model performance.

Abstract

The ever-expanding biomedical literature necessitates an efficient and robust mining platform, with the foundational step being a reliable Biomedical Named Entity Recognition (BioNER) system. Existing approaches, such as multi-task and collaborative learning, have attempted to address dataset heterogeneity but often rely on complex architectures with task-specific layers, limiting scalability. A key research gap is the development of a unified model that optimises across multiple datasets without sacrificing performance or introducing architectural complexity. In this study, we propose a novel Loss-Masking Optimisation framework for BioNER models that enables multi-dataset training via a dataset-aware masking strategy. This approach extends the standard BERT-based NER pipeline by introducing a tag-masking array that nullifies logits for tags absent in the originating dataset, thereby reducing cross-dataset interference. Using this methodology, we trained a single BioNER model across all 16 biomedical NER datasets, achieving higher precision and overall F1 scores than conventional multi-dataset training. While some datasets showed performance gains, others stayed near baseline, and a few declined, underscoring the nuanced impact of dataset interactions. To the best of our knowledge, this is among the first studies to apply a dataset-aware loss-masking mechanism to unified multi-dataset BioNER training, offering a scalable alternative to multi-task architectures.

Bookmark

View Full Paper

Bookmark

View Full Paper

Loss masking-based gradient optimisation: A new approach for training supervised biomedical named entity recognition models using multi-dataset

Key Points

Abstract

Cite This Study