Key points are not available for this paper at this time.
As botanists, we recognize that herbarium specimens document spatial and temporal patterns of plant diversity. But they may also tell stories beyond the plants themselves—about how a specimen may have languished for decades in obscurity until a botanist with a keen eye recognized it as a new species (Ziziphus celata; see, e. g. , Judd and Hall, 1984; for image, see https: //www. floridamuseum. ufl. edu/scripts/dbs/herbsₚroject/herbsproject/herbsₚubₚroc. asp? accno=136888 https: //www. idigbio. org/content/portal-curiosities-asa-gray-and-quest-shortia-galacifolia-%E2%80%93-case-study-importance) or of the metamorphosing nature of collections themselves, with specimens yielding data undreamt of when they were collected (e. g. , Inga umbellifera; Hart et al. , 2016; for image, see http: //elmer. rbge. org. uk/bgbase/vherb/bgbasevherb. php? cfg=bgbase/vherb/fulldetails. cfg Wieczorek et al. , 2012), improved methods for imaging herbarium specimens (e. g. , Tegelberg et al. , 2014), and the development of high-throughput workflows (as implemented, for example, in the Paris Herbarium; https: //www. idigbio. org/sites/default/files/workshop-presentations/spnhc2014/02SPNHC2014chagnoux. pdf), the pace of digitization has increased dramatically. The Global Plants Initiative (GPI; https: //plants. jstor. org/), a push to digitize the world's botanical type specimens, demonstrated both the global will and collaborative spirit of herbarium leaders from around the world and the feasibility of large-scale, international, collaborative digitization efforts. In many ways, GPI set the stage for the remarkable digital herbarium resources available today. Data aggregators—such as the Global Biodiversity Information Facility (GBIF), the Atlas of Living Australia (ALA), the U. S. Geological Survey's portal (Biodiversity Information Serving Our Nation; BISON), and iDigBio (Integrated Digitized Biocollections, funded by the U. S. National Science Foundation (NSF) ; see below) —provide digital biodiversity data, including information from herbarium specimens, to the public using standardized terminology as specified in DarwinCore. Other national and regional aggregators also serve biodiversity information, and collectively these aggregators provide data that allow visualization and analysis of patterns of biodiversity in novel and exciting ways. Although decades have passed since the first herbarium records were databased and made available via the internet, it is only within the past few years that sufficient numbers of specimen records have become available for large-scale, innovative research. We are just now at the brink of new opportunities for synthetic analyses that connect digitized specimen data with other resources (phylogenetic, climatological, genomic, etc. ; e. g. , Soltis and Soltis, 2016) to address both novel questions in plant biology and longstanding questions from new perspectives and larger scales. iDigBio (www. idigbio. org) was founded in 2011 to collect and share the rapidly increasing volume of digitized specimen data flowing from new investments in digitization technology and workflows funded by NSF. As of this writing, iDigBio currently serves over 105 million specimen records from US and international natural history collections. Of these, approximately 50 million are herbarium records. Moreover, iDigBio serves over 22 million media records, most of which are images, with nearly 20 million representing herbarium specimens. What a resource for botanical research and education—and many more millions are in the digitization pipeline! When NSF began funding for the Advancing Digitization of Biodiversity Collections Program in 2011, the botanical community immediately emerged as a collaborative, organized network of institutions with shared goals and methods for digitizing herbarium specimens. Through the joint efforts of iDigBio and several digitization projects, including both thematic and regional networks, best practices for herbarium digitization have been developed (Nelson et al. , 2015), and these methods continue to spur digitization. Efforts to develop an online U. S. Virtual Herbarium portal—with plant-specific resources, including maps, checklists, literature, etc. , and access to all of iDigBio's herbarium records—are currently underway, with an anticipated completion date of 2018. Digital repositories provide fast, easy, and cheap access to millions of specimens for anyone with an Internet connection. While the goal of bringing museum specimens out of the cabinets and onto the Internet is itself worthwhile for many reasons, the real value of digitization lies in the novel uses of digital data and images. Given that date of collection is a key element of an herbarium label and is typically captured in online databases, it is not surprising that herbarium label data have been used to study shifts in plant phenology (that is, the seasonal timing of life-history events such as flowering and leaf-out) associated with climate change (e. g. , Miller-Rushing et al. , 2006; Hart et al. , 2014). For example, Miller-Rushing and Primack (2008) demonstrated shifts in flowering time in Massachusetts over the past century and a half using a combination of Henry David Thoreau's notes and modern herbarium records, and this approach of harvesting flowering times and bud burst from herbarium specimens has been used increasingly to address questions of phenological shifts from both regional and phylogenetic perspectives (see review by Willis et al. , 2017). However, the study by Miller-Rushing and Primack (2008) relied on visits to herbaria; the recent, rapid increase in online digital specimen records enables similar studies on greater geographic scales, as well as in remote areas with few herbaria (e. g. , Hart et al. , 2014). Despite the apparent utility of herbarium records for phenological research, it is clear that such data are not without biases, and the analysis and accommodation of these biases are intriguing research avenues in their own right (Davis et al. , 2015; Meyer et al. , 2016). As digital resources grow, with aggregators pulling together data from around the globe, the potential number of specimen records for a given study will be sufficiently large to ameliorate at least some of the biases that arise when specimens are used for purposes beyond those for which they were collected. Herbarium label data, like label data for other museum specimens, can also be rich sources of locality information that can be used, in conjunction with environmental data, to generate species distribution models (also referred to as ecological niche models). Accurate models rely on large numbers of data points, and sufficient data most effectively (and sometimes only) can be achieved via the vast repositories of digitized herbarium records. Such ecological niche models have been instrumental in predicting responses of species to climate change and are now being applied to questions about the role of ecological shifts associated with speciation at both the diploid and polyploid levels (e. g. , Marchant et al. , 2016; Visger et al. , 2016). In the northern hemisphere, we expect that plant species will move northward or to higher elevations, if they are able, in response to a warming climate. However, predicted plant migrations are more complex in some biodiversity hotspots, such as California, where models for plants along elevational gradients predict that some species will move to lower—and warmer—elevations rather than up mountains as predicted by changing temperatures, following water availability rather than temperature gradients (see review by Rapacciuolo et al. , 2014). In Florida, species from the northern part of the state, representing the southern end of the eastern deciduous forest, are predicted to move northward by the end of the century, whereas many species from the central part of the state—representing scrub habitats that are already hot and dry—are predicted to move southward, most likely in response to water availability (Fig. 1). The application of locality data to species distribution modeling has further implications for conservation, particularly when used in conjunction with phylogenies to document spatial patterns of phylogenetic diversity (e. g. , J. M. Allen et al. , University of New Hampshire, unpublished manuscript; Fig. 2). Although application of ecological niche modeling methods can be accomplished using data extracted from examination of physical specimens, availability of digital records greatly extends the scope, scale, and geographic regions to be explored and improves the quality of the models, enabling research that would otherwise be impossible. Maps showing the distribution of vascular plant species diversity in Florida based on ecological niche models developed from locality data from >500, 000 specimen records, Bioclim variables (Hijmans et al. , 2005), and Maxent software (Phillips et al. , 2006). (A) Present diversity for 1500 species of vascular plants in Florida (of the 4200 species) for which sufficient numbers of digital herbarium records were available to construct models. (B) Changes in the distribution of species diversity predicted in 2050 relative to the present. Green areas indicate increased species diversity, and tan areas represent decreased species diversity relative to the present (C. C. Germain-Aubrey et al. , University of Florida, unpublished data). # = number. (A) Map showing the distribution of vascular plant phylogenetic diversity in Florida based on ecological niche models (from Fig. 1) and a phylogeny of Florida plants based on rbcL and matK sequences (see J. M. Allen et al. , University of New Hampshire, unpublished manuscript, for methods). Note that some areas, such as southern peninsular Florida, that show low species diversity (Fig. 1A) have intermediate levels of phylogenetic diversity, indicating that the species present represent divergent branches of the phylogeny. (B) Map showing endemism hotspots for vascular plants of Florida based on ecological niche models (from Fig. 1) ; endemism hotspots were identified by dividing endemic species diversity by total species diversity (C. C. Germain-Aubrey et al. , University of Florida, unpublished data). The use of digitized herbarium records for phenological and ecological research just begins to demonstrate the potential of these data. Largely unexplored to date is the analysis of specimen images, yet high-throughput methods of image analysis are under development and are being applied to questions of species identification (e. g. , Unger et al. , 2016; Carranza-Rojas et al. , 2017) as well as large-scale spatial phenological patterns (see Willis et al. , 2017, for discussion). Although the resolution of images varies among collections (but note that GPI specifies a minimum resolution of 600 dots per inch), images are nearly untapped sources of morphological characters and functional traits, and methods that allow extraction of such traits will enable multiple new avenues of exploration. For example, imagine being able to use hundreds or thousands of images to score morphological characters (whether for systematic or ecological study), examine color and infer pigment concentrations and identifications, measure stomatal and trichome densities, or survey “paleoherbivory” via analysis of insect damage on digitized fossil leaves, to suggest just a few. High-throughput phenotyping of herbarium specimen images—via any of several methods of data capture and analysis (Gehan and Kellogg, 2017) —can revolutionize studies of plant ecology and evolution and can provide ties to phenomic studies that integrate genotypes and phenotypes. Beyond the images themselves, the textual descriptions of locality information, habitats, and associated species can potentially be mined for key words that correspond to traits or features of interest. Linking traits inferred from specimens to databases such as the TRY Plant Trait Database (https: //www. try-db. org/de/de. php) can yield powerful new data sets for exploring a range of questions in studies of plant diversity. Refinement and application of text-parsing algorithms, coupled with the development of ontologies and data standards for new characters (see Stucky et al. , 2016), will likewise lead to further use of specimen data in ecological and evolutionary research. Novel applications of digitized herbarium data are beginning to appear. For example, online herbarium resources have augmented literature-based information on medicinal uses of plants (Souza and Hawkins, 2017). Tools that enable integration of digitized herbarium specimens with phylogenies and other resources (e. g. , Soltis and Soltis, 2016) will also lead to new discoveries. And, of course, specimen images and digitized data continue to have key roles to play in systematics, the longstanding focus for most herbarium use. The world's herbaria are transforming, and collectively they offer new avenues for synthetic research that can address pressing societal problems related to climate change, food security, and conservation. Champions of herbaria have long promoted the value of collections, and, with digitization and technological breakthroughs in imaging, molecular biology, and genetics, herbaria, like fine wine, seem to continue to increase in value with time. I thank Mark Whitten, Dick Olmstead, Chuck Davis, and Pete Hollingsworth for inspiring discussions about their favorite herbarium specimens and Editor-in-Chief Pamela Diggle and three anonymous reviewers for helpful suggestions on an earlier draft of this manuscript. This work was supported by NSF grant DBI-1547229.
Pamela S. Soltis (Fri,) studied this question.