Exploring the boundaries: gene and protein identification in biomedical text

Key Points

Key points are not available for this paper at this time.

Abstract

BACKGROUND: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. METHODS: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. RESULTS: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation. CONCLUSION: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.

Mark Helpful

Bookmark

Relay

View Full Paper