What question did this study set out to answer?

To examine the relationship between the predictive accuracy of large language models and their correlation with human forecasts, particularly the accuracy-correlation effect.

April 18, 2026Open Access

Crowdsourced versus large language models forecasting: evidence for the accuracy–correlation effect

Key Points

To examine the relationship between the predictive accuracy of large language models and their correlation with human forecasts, particularly the accuracy-correlation effect.
Utilized 76 model × prompt forecast sets from 16 LLMs.
Analyzed 580 resolved ForecastBench questions, separating data and market questions.
Computed LLM accuracy and correlations with human aggregates like superforecasters and the general public.
Employed linear mixed-effects models for statistical analysis.
Found a robust positive association between LLM accuracy and human-AI correlation.
Correlation was higher with the general public than with superforecasters.
Market question correlations were weaker than those for data questions.
Results suggest increasing accuracy and correlation reduces optimal human influence in data-rich scenarios.

Abstract

Over the past quarter century, crowdsourced forecasting has largely outperformed individual forecasters. Today, large language models (LLMs), aggregating human knowledge at scale, constitute a new form of collective intelligence (CI). A central question is how LLM predictive accuracy associates with human-AI correlation, and whether this relationship exceeds what would be expected if both merely track the same underlying truth. We investigate this through the accuracy-correlation effect (ACE), which posits that as algorithmic systems improve, they increasingly correlate with human predictions, potentially diminishing human value in hybrid ensembles. Using 76 model × prompt forecast sets from 16 LLMs on 580 resolved ForecastBench questions, we computed LLM accuracy and correlations with two human aggregates (superforecasters, general public), separately for databases (n = 526) and prediction markets (n = 54) questions. Linear mixed-effects models show a robust positive association between LLM accuracy and human-AI correlation that substantially exceeds independent-errors predictions. Correlations were lower for superforecasters than for the general public, and weaker for markets than for data questions. These results support ACE while indicating that increasing correlation reflects more than improved signal tracking alone, suggesting that simultaneous increases in accuracy and correlation may reduce optimal human weights in data-rich settings, while human judgement retains critical value in contextual reasoning scenarios. This article is part of the theme issue 'The evolution of collective intelligence'.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper