This study presents the first application of e-value–based false discovery rate (FDR) control to Differential Item Functioning (DIF) detection, addressing long-standing limitations of p -value-based approaches when model assumptions are violated—for example, under multidimensionality, local item dependence, or extreme sample sizes. Two comprehensive simulation studies were conducted to evaluate e-BH (the e-value analogue of BH) procedures, using K-fold and Multisplit likelihood-ratio e-values, under (a) multidimensional contamination and (b) testlet-based local dependence. Across both scenarios, e-BH consistently provided stronger and more stable control of Type I error, FDR, and family-wise error rate (FWER) than classical procedures such as Benjamini–Hochberg (BH) and Holm. Even under severe model misspecification, e-BH maintained substantially lower false-positive rates while remaining relatively competitive in terms of Type II error. A key finding concerns sample size: classical p -value methods exhibited inflation of Type I error as N increased, whereas e-BH preserved stable error control due to its model-agnostic calibration. An empirical application using Progress in International Reading Literacy Study (PIRLS) data further demonstrated that e-BH produces a more defensible and operationally sustainable set of DIF flags than traditional approaches. Together, these results establish e-values as a powerful and robust evidential tool for DIF detection in modern assessment contexts.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shan Huang
Chungwoon University
David Goretzko
Goethe University Frankfurt
Educational and Psychological Measurement
Goethe University Frankfurt
Chungwoon University
Building similarity graph...
Analyzing shared references across papers
Loading...
Huang et al. (Thu,) studied this question.
synapsesocial.com/papers/69e320af40886becb653fc2e — DOI: https://doi.org/10.1177/00131644261433236
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: