Incunabula, or books printed before 1500, are extremely difficult and expensive to convert to digital form. The primary challenges arise from the use of non-standard typographical glyphs based on medieval handwriting to abbreviate words. Further difficulties are also posed by the practice of inconsistently marking word breaks at the end of lines and reducing or even eliminating spacing between some words. As such, these documents form a distinct genre of electronic document that poses unique challenges for conversion to digital form. From 2005–2007, the Preservation and Access Research and Development Program at the National Endowment for the Humanities funded a study to explore methods for digitizing these difficult texts. This paper describes some of the results of that project. The work described in this paper was completed by the Approaching the Problems of Digitizing Latin Incunables project funded by the National Endowment for the Humanities Division of Preservation and Access. The material in this paper is drawn from the project application, internal technical reports, grant project reports and the project descriptions included in and . Much of this work was inspired by Ross Scaife and his work building a corpora of Latin Colloquia. I am deeply grateful for Ross's comments, advice and support. A version of this paper will also be published as part of the project web site.
Jeffrey A. Rydberg-Cox (Thu,) studied this question.