June 28, 2024

In Search of Unicorns: Assessing Statistical Assumptions in Real Psychology Datasets

Key Points

Key points are not available for this paper at this time.

Abstract

There are many statistical metrics that can affect the performance of ordinary least squares models (OLS), including model design, sample size, and the extent to which statistical assumptions are violated. Previous studies summarising the distributional characteristics of observed variables in psychology have shown that measures are rarely normally distributed, and violations of homoscedasticity are common. We collected a sample of 588 OLS models from 119 published and unpublished papers and extracted distributional information from the model residuals as a more direct indicator of potential assumption violations. Further, we summarised general model information, including type and number of predictors, sample size, and sample size ratios for categorical predictors. We estimated the typical values and the plausible ranges of these values for each metric using Bayesian estimates and Highest Posterior Density Intervals from intercept-only models. We found that violations of normal and homoscedastic errors are common, and that models tend to have more values at the tails of the residual distributions than would be expected in a model that meets the OLS assumptions. We discuss the implications of these findings in the context of preregistration and power analyses. We also provide guidance for applying alternative models which might be more appropriate for estimation and hypothesis testing than OLS models in the situations we identified as common in practice.

Mark Helpful

Bookmark

Relay