A standardized effect size for the mean difference between two independent groups is frequently reported in published research. But its conceptual simplicity masks underlying complexities in both form and usage. Three types of effect sizes commonly used in research are considered (two proposed by Jacob Cohen and one by Gene Glass). All three suffer from what is called "the curse of the standardizer" under variance heterogeneity. Previous rationales for preferring Glass's Δ when group variances are unequal are critically reviewed. It is argued that the twin problems of (a) bias in all three effect sizes under violations of assumptions and (b) poor coverage rates in existing confidence interval estimators represent greater threats to valid inference than misplaced preferences for one type of effect size. A new heteroscedastic-consistent interval estimator is proposed for all three effect sizes under misspecification of assumptions to address shortcomings in existing confidence intervals. Its accuracy and robustness under varying levels of empirically guided non-normality are shown to be superior to existing large-sample normal confidence intervals proposed by Hedges (1981) and Bonett (2008). Bias in effect sizes is found to be widespread under non-normality. It may result in misleading inferences about the size of an effect in many instances. Glass's Δ is found to be problematic under its recommended conditions. Improved bias-correction for estimates under non-normality is an important focus for future research. The overall findings point to limited validity in inferences when using these three effect sizes in many research conditions commonly encountered in practice. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Paul Dudgeon (Mon,) studied this question.