Key points are not available for this paper at this time.
Abstract The success of multi‐model ensemble combination has been demonstrated in many studies. Given that a multi‐model contains information from all participating models, including the less skilful ones, the question remains as to why, and under what conditions, a multi‐model can outperform the best participating single model. It is the aim of this paper to resolve this apparent paradox. The study is based on a synthetic forecast generator, allowing the generation of perfectly‐calibrated single‐model ensembles of any size and skill. Additionally, the degree of ensemble under‐dispersion (or overconfidence) can be prescribed. Multi‐model ensembles are then constructed from both weighted and unweighted averages of these single‐model ensembles. Applying this toy model, we carry out systematic model‐combination experiments. We evaluate how multi‐model performance depends on the skill and overconfidence of the participating single models. It turns out that multi‐model ensembles can indeed locally outperform a ‘best‐model’ approach, but only if the single‐model ensembles are overconfident. The reason is that multi‐model combination reduces overconfidence, i.e. ensemble spread is widened while average ensemble‐mean error is reduced. This implies a net gain in prediction skill, because probabilistic skill scores penalize overconfidence. Under these conditions, even the addition of an objectively‐poor model can improve multi‐model skill. It seems that simple ensemble inflation methods cannot yield the same skill improvement. Using seasonal near‐surface temperature forecasts from the DEMETER dataset, we show that the conclusions drawn from the toy‐model experiments hold equally in a real multi‐model ensemble prediction system. Copyright © 2008 Royal Meteorological Society
Weigel et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: