July 9, 2012Open Access

Concept annotation in the CRAFT corpus

Key Points

Key points are not available for this paper at this time.

Abstract

As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Michael Bada

University of Chicago

Miriam Eckert

University of Colorado System

Donald L. Evans

University of Colorado Boulder

Journals

BMC Bioinformatics

SHILAP Revista de lepidopterología

Actions

Institutions

University of Colorado Boulder

University of Colorado Anschutz Medical Campus

Jackson Laboratory

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Concept annotation in the CRAFT corpus

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study