This study proposes a novel data-preparation method in lexicostatistics to potentially uncover historical divergence and language contact in Ainu dialects, addressing the methodological limitations of conventional approaches that redundantly count recurring regularities, thereby introducing statistical bias in previous lexical-item data. Our method systematically extracts almost all potential regularity data in the first step. Then, it extracts lexical-item data as linguistic information not captured by the regularity data, thereby preventing artificial inflation of specific patterns in previous data. Our revised homogeneity analysis is performed in two datasets: one consisting solely of lexical-item data and another combining lexical items with regularity data. Quantification results of Ainu dialects are visualized as 3D interactive graphs using HTML5. Visualization results from the dataset, which combines lexical items and regularity data, position the Asahikawa, Nayoro, and Soya dialects near the coordinate origin—representing the “average” characteristics—suggesting historical language contact and mixture in these dialects. Conversely, the visualization result of lexical-item data revealed a clear north–south division in Sakhalin dialects, with the northern group exhibiting similarity to SaruChitose dialects in southwestern Hokkaido Ainu dialects and the southern group showing similarity to northeastern Hokkaido Ainu dialects, demonstrating an A–B–A geolinguistic distribution that has not yet been discovered until our analyses. These findings demonstrate that our framework can integrate geolinguistic and historical linguistic perspectives: the former aligns with lexical item data in our datapreparation methods, and the latter corresponds to regularity data. Thus, our datapreparation and quantification methods will shift the focus in lexicostatistics from classification back to history in its original interest
Ono et al. (Fri,) studied this question.