What question did this study set out to answer?

This study aims to introduce computational techniques to measure and visualize similarities among textual witnesses.

June 3, 2026

Measuring similarities and visualizing patterns in a text tradition using pairwise sequence alignment and t-distributed stochastic neighbour embedding

Key Points

This study aims to introduce computational techniques to measure and visualize similarities among textual witnesses.
Applied global and local alignment algorithms to character strings of medieval manuscripts.
Utilized t-distributed stochastic neighbour embedding for visualizing similarity scores.
Tested a language-specific modification of the Needleman–Wunsch algorithm on two Hebrew manuscripts.
Successfully replicated identified textual families within the corpus based on similarity patterns.
Discovered a previously overlooked textual subgroup and evidence of two manuscripts written by the same scribe.
Demonstrated improved performance of the alignment algorithm for Hebrew script through specific modifications.

Abstract

Abstract Traditionally, the comparison of textual witnesses is achieved through manual collation. This study introduces a computational approach adapting methods from bioinformatics: pairwise sequence alignment and dimensionality reduction, to measure and visualize textual relationships across a corpus. We apply global (Needleman–Wunsch) and local (Smith–Waterman) alignment algorithms directly to character strings, generating quantitative similarity scores which are then represented through t-distributed stochastic neighbour embedding. We also test a language-specific modification of the Needleman–Wunsch algorithm on two manuscripts in our corpus. Unlike automated collation methods that aim for semantic accuracy, this approach focuses on corpus-wide similarity patterns. The test corpus contains twenty-four medieval manuscripts of the Liturgical Targum, preserved in Jewish festival prayer books. Previous (manual) philological analysis had already identified two textual families among the Targum units within these prayer books. Our computational method successfully and independently replicates these families and reflects the overall coherence of the corpus. Crucially, it enabled new insights overlooked in the manual study: the new identification of a textual subgroup and the discovery that two manuscripts were written by the same scribe. Local alignment proves effective for identifying the closest textual parallels of a fragmentary manuscript. The language-specific alignment modification test on two manuscripts indicates improved alignment algorithm performance for Hebrew script. This article demonstrates that combining pairwise sequence alignment with dimensionality reduction is a powerful exploratory tool for engaging with a text corpus. The method requires only accurate transcriptions to produce maps of textual relationships that can guide subsequent detailed collation and interpretation.

Bookmark

Measuring similarities and visualizing patterns in a text tradition using pairwise sequence alignment and t-distributed stochastic neighbour embedding

Key Points

Abstract

Cite This Study