Case reports and case series articles comprise a significant portion of the biomedical literature, yet unlike case reports, the National Library of Medicine does not index case series as a Publication Type. This hurts clinicians’ and researchers’ ability to retrieve, identify and analyze evidence from this type of study. PubMed articles mentioning “case series” in title or abstract were characterized to learn what are considered to be case series by the authors themselves. We then set aside articles better indexed as other standard publication types – case reports, cohort studies, reviews and clinical trials -- as well as those that discuss (rather than report the results of) case series articles, to create a corpus of typical case series articles. A random sample of these articles was evaluated by two annotators who confirmed that the great majority (88%) satisfy a formal definition of “case series”. The corpus was utilized in an automated transformer-based machine learning indexing model. Case series performance of this model on hold-out data was excellent (precision = 0.887, recall = 0.952, F1 = 0.918, PR-AUC = 0.941) and manual evaluation of 100 articles tagged as “case series” revealed that 88% satisfied a formal definition of case series. This study demonstrates the feasibility of automatically indexing case series articles. Indexing should enhance their discoverability, and hence their medical value, for evidence synthesis groups as well as clinicians and general users of the biomedical literature.
Shahidehpour et al. (Fri,) studied this question.