Soybean (Glycine max L. Merr.) is a globally important crop with a rapidly expanding body of genomics literature driven by advances in sequencing and functional genomics. Thousands of studies reference soybean genes using standardized Glyma identifiers; however, systematic analyses of how these identifiers are distributed across chromosomes in the scientific literature remain limited. Here, we present a chromosome-resolved bibliometric analysis of soybean gene mentions using a reproducible rule-based text mining approach. PubMed abstracts published between December 2006 and December 2025 were mined for standardized Glyma gene identifiers using regular-expression-based entity extraction. A total of 377 PubMed records were retrieved, of which 340 abstracts (90.2%) contained at least one Glyma gene identifier. The median number of unique genes mentioned per abstract was 1, with a maximum of 14 genes reported in a single study. Our results reveal three major patterns. First, soybean genomics research remains predominantly gene-centric, with most abstracts referencing one or two genes. Second, apparent chromosome-level disparities exist in literature representation within the subset of studies using standardized Glyma identifiers, with chromosomes 3 and 16 exhibiting the highest frequencies of unique gene mentions. A Chi-square goodness-of-fit test confirmed that these differences deviate significantly from a uniform distribution (χ2 = 123.71, p < 0.001), indicating non-random patterns of gene reporting. Third, a small subset of genes dominates the literature, while the majority of annotated genes are mentioned infrequently, reflecting a long-tailed distribution of research attention. This analysis captures reporting patterns in studies that explicitly use standardized Glyma identifiers and therefore represents a defined subset of the broader soybean genomics literature. Within this scope, the findings highlight uneven adoption of standardized gene nomenclature and chromosome-level differences in research emphasis. More broadly, this study demonstrates the utility of transparent, rule-based text mining approaches for large-scale bibliometric analyses in plant science and provides a scalable framework for comparative analyses across crop species.
Kassem et al. (Fri,) studied this question.