Context. Empirical Software Engineering drives innovation in SE through qualitative and quantitative studies. Since the 2006 Dagstuhl seminar, concerns about methodological rigor persist. Recent studies have highlighted misconceptions in statistical practices in ESE, yet their impact on the field’s progress and verifiability remains uninvestigated. Aim. To analyze three decades of SE research to uncover flaws in statistical methods used for data analysis in empirical studies. Next, to observe the capability of current empirical software engineering experts to identify and address these issues. Method. We conducted a large-scale literature survey, collecting over 27,000 empirical SE papers. Using a Large Language Model (LLM), we categorized studies into methodologically adequate and not adequate categories, and selected 28 primary studies (14 from each category) for expert evaluation via a focus group-based workshop. Results. Our findings reveal widespread misuse of statistical methods in empirical SE studies. Additionally, experts often struggle to detect these flaws and provide proper corrections, raising concerns about methodological rigor in the field. Conclusions. This study highlights the risks of perpetuating statistical misconceptions and advocates for a critical reform in the approach to data analysis in ESE. We advocate for developing frameworks that foster methodological awareness and rigor.
Esposito et al. (Sat,) studied this question.