What does this research mean for the field?

The BibCrit open data infrastructure provides a reproducible, multi-tradition morphological corpus pipeline, a versioned NLP analysis system, and an open API to facilitate AI-assisted biblical textual criticism. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This research aims to develop an open-source data infrastructure to facilitate AI-assisted biblical textual criticism.

June 5, 2026Open Access

BibCrit: An Open Data Infrastructure for AI-Assisted Biblical Textual Criticism

Key Points

This research aims to develop an open-source data infrastructure to facilitate AI-assisted biblical textual criticism.
Developed a morphological corpus pipeline with 2,285,199 word tokens from multiple biblical traditions.
Implemented a reproducible NLP analysis system using structured LLM prompts and SHA-256 cache keys.
Created an open REST-style API for textual analyses accessible in English and Spanish.
Accumulated corpus of AI-generated textual analyses from fifteen analytical tools.
Ensured reproducibility of NLP outputs across sessions for consistent results.
Released all code and resources under Apache 2.0 for public access.

Abstract

This paper describes the data infrastructure underlying BibCrit, an open-source web application for AI-assisted biblical textual criticism. We document three components of independent interest to the digital humanities and data mining communities: (1) an eight-tradition morphological corpus pipeline covering 2,285,199 word tokens across the Masoretic Text, Septuagint, Dead Sea Scrolls, Samaritan Pentateuch, Peshitta Syriac, Greek New Testament, Targum (Onkelos and Jonathan), and the Latin Vulgate, all serialized to a shared CSV schema; (2) a versioned, reproducible NLP analysis system using structured LLM prompts with SHA-256 cache keys that guarantees identical inputs produce identical outputs across sessions; and (3) an open REST-style cache API that exposes the accumulated corpus of AI-generated textual analyses — covering fifteen analytical tools and both English and Spanish locales — for harvesting by downstream computational studies. All code, corpus ingestion scripts, and prompt templates are released under Apache 2.0 at github.com/Jossifresben/BibCrit.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper