This paper continues an ongoing investigation of measures of distinctiveness (also known as keyness measures), this time employing a qualitative, comparative evaluation of three different measures: logarithmic Zeta, Welch’s t-test, and Log-likelihood ratio test. Our domain of application is the contemporary French novel, more specifically four types of French novels from the period 1970-1999, namely: sentimental novel, crime novel, science fiction, and littérature blanche (literary fiction). Our evaluation proceeds in the following steps: First, we establish important abstract characteristics of specific literary subgenres based on a synthesis of close readings of scholarly literature on these subgenres, resulting in qualitative, expert-based subgenre profiles. Second, we use a purely statistical approach, namely three different measures of distinctiveness, to identify words that are expected to be statistically typical or characteristic of groups of texts such as subgenres, when compared to other texts. Finally, we compare expertise and statistics, that is, attempt to establish, for each of the four subgenres, a mapping between individual words found to be statistically distinctive of this subgenre and specific aspects contained in the relevant subgenre profile and count the matches. It turns out that each measure yields a different list of most distinctive words that therefore, relates differently to the subgenre profiles. The analysis of these varying degrees of overlap contributes to a better understanding of the characteristics of and differences between the three measures, while also serving as an example of a qualitative evaluation of a statistical measure.
Röttgermann et al. (Fri,) studied this question.