This paper describes the data infrastructure underlying BibCrit, an open-source web application for AI-assisted biblical textual criticism. We document three components of independent interest to the digital humanities and data mining communities: (1) an eight-tradition morphological corpus pipeline covering 2,285,199 word tokens across the Masoretic Text, Septuagint, Dead Sea Scrolls, Samaritan Pentateuch, Peshitta Syriac, Greek New Testament, Targum (Onkelos and Jonathan), and the Latin Vulgate, all serialized to a shared CSV schema; (2) a versioned, reproducible NLP analysis system using structured LLM prompts with SHA-256 cache keys that guarantees identical inputs produce identical outputs across sessions; and (3) an open REST-style cache API that exposes the accumulated corpus of AI-generated textual analyses — covering fifteen analytical tools and both English and Spanish locales — for harvesting by downstream computational studies. All code, corpus ingestion scripts, and prompt templates are released under Apache 2.0 at github.com/Jossifresben/BibCrit.
Jose Fresco Benaim (Wed,) studied this question.