This paper advocates the systematic use of effect size measures—such as Cohen’s d and Cliff’s δ—in the comparative evaluation of metaheuristic optimization methods. While statistical significance tests remain the dominant tool for determining whether observed performance differences are unlikely to be due to chance, they do not convey the magnitude or practical relevance of these differences. As a result, many studies report “significant” results that offer limited insight into the actual strength of one algorithm over another. Despite the clear value of effect size measures in addressing this gap, their adoption within the metaheuristics community remains limited. This paper highlights the role of effect sizes as complementary, not alternative, to traditional hypothesis testing. Through illustrative examples and discussion, we demonstrate how effect sizes provide a more informative and transparent basis for interpreting algorithmic superiority, contributing to more robust, reproducible, and meaningful empirical comparisons. The paper aims to encourage researchers to integrate effect size reporting into standard practice, thereby strengthening the methodological foundations of experimental research in metaheuristics.
Ismail et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: