What question did this study set out to answer?

Investigate if machine learning can simulate expert creativity assessments using small training samples.

April 15, 2026Open Access

Random Forest Predicts Human Ratings of Creative Stories Using Very Small Training Samples

Key Points

Investigate if machine learning can simulate expert creativity assessments using small training samples.
Used a dataset of 411 short stories.
Compared Random Forest, Gradient Boosted Trees, and Decision Tree models.
Assessed predictions based on story length and Divergent Semantic Integration.
Identified the optimal algorithm and minimum training sample size for accurate predictions.
Random Forest outperformed other models with a high correlation to CAT ratings (r = 0.80).
Achieved reliable predictions using as few as 25 training stories.
Showed superior accuracy and less dependency on story length compared to LLM-based models.

Abstract

The Consensual Assessment Technique (CAT) is a gold standard of creativity assessment which provides valid product-based creativity scores that are contextually grounded (stemming from raters with unique expertise, culturally and historically situated). However, its implementation is often demanding (raters’ burden, complex rating designs). This study investigates whether machine learning can effectively simulate expert-panel judgments of creativity using minimal training data. Using a dataset of 411 short stories, we compared the performance of Random Forest (RF), Gradient Boosted Trees, and Decision Tree models, based on story length and Divergent Semantic Integration, to predict expert CAT ratings by (1) identifying the optimal algorithm and (2) the minimum training sample size required for reliable prediction. Results indicate that RF consistently outperformed other algorithms, achieving high correlations with CAT scores (r = 0.80) using as few as 25 training stories. Furthermore, RF demonstrated superior accuracy and lower reliance on story length compared to LLM-based scoring models. These findings provide a robust proof-of-concept for using simulated expert panels as a scalable alternative to (decontextualized) automated assessment methods, while reducing human raters’ burden and the logistical constraints of complex rating designs. Extension of this work to different contexts, creativity tasks and domains are necessary to gauge its generalizability.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Baptiste Barbot

Yale University

Thomas Calogero Kiekens

UCLouvain

Journals

Behavioral Sciences

Actions

Institutions

Yale University

UCLouvain

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Random Forest Predicts Human Ratings of Creative Stories Using Very Small Training Samples

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider