An unusual outcome variance contributed to uncovering major cases of research misconduct, leading to over 200 retractions. Detecting such problematic randomized trials early – before they unduly influence clinical guidelines – remains challenging. Empirical evidence indicates that differences in variances between trial arms (DiVBTAs) are usually small and non-significant in properly conducted trials. This study investigated whether the converse – unusually large and statistically significant DiVBTAs - can serve as a red flag for potentially problematic trials. We conducted simulations to assess the sensitivity and specificity of a DiVBTA-based decision rule under realistic scenarios, including proper randomization, heterogeneous treatment effects, and missing-not-at-random data. In parallel, we applied the rule in a real-world analysis of 226 systematically sampled randomized trials in diabetes research to assess whether unusually large and statistically significant DiVBTAs occur with sufficient frequency to warrant screening. Unusually large DiVBTA values were defined as those falling outside the 3-sigma prediction limits. Simulations demonstrated high specificity, with legitimate trials rarely flagged (low false-positive rate), and adequate sensitivity for detecting a specific form of severe fabrication. In the empirical analysis, 19 out of 226 trials (8%) were flagged as potentially problematic demonstrating utility to screening trials for unusually large and statistically significant DiVBTAs. Subsequent screening of the identified trials revealed additional concerns in 18 (out of 19) flagged trials. These findings suggest that screening for unusually large, statistically significant DiVBTAs offers a simple, low-effort tool to identify trials warranting further scrutiny, potentially strengthening the reliability of evidence used in clinical guidelines.
Hujoel et al. (Wed,) studied this question.