ABSTRACT Count data, such as product sales and disease case counts, are common in business forecasting and many areas of science. Although the Poisson distribution is the best known model for such data, its use is severely limited by its assumption that the dispersion is a fixed function of the mean, which rarely holds in real‐world scenarios. This assumption is especially problematic in forecasting because it compromises risk assessment with unreliable estimates of predictive uncertainty. While various alternatives exist, they have significant limitations of their own: They may work only in specific settings (such as overdispersed data), lack interpretability, or require extensive computation, rendering them unusable on data sets of moderate to large size. In this paper, we survey the current landscape of count data modeling in the presence of varying dispersion, develop extensions to previously proposed methods, and introduce a computationally efficient latent‐variable approach based on a discrete log‐normal distribution. The method allows for unconstrained dispersion, ranging from underdispersion to overdispersion (relative to the Poisson model), that may vary as a function of covariates. We develop scalable algorithms for fitting this model, resulting in a 98% reduction in computation time in our simulation relative to a leading competitor: a Conway–Maxwell–Poisson regression model. We compare the methods in simulation and a case study in the area of grocery sales forecasting. In the case study, the discrete log‐normal model outperforms the other methods in terms of squared prediction error. Compared to a negative binomial model, it produces prediction intervals that are 24% narrower on average without a commensurate decrease in coverage. In practice, these improvements would enable more precise risk assessment and resource allocation in business, epidemiology, and other domains.
Huch et al. (Mon,) studied this question.