The Consensual Assessment Technique (CAT) is a gold standard of creativity assessment which provides valid product-based creativity scores that are contextually grounded (stemming from raters with unique expertise, culturally and historically situated). However, its implementation is often demanding (raters’ burden, complex rating designs). This study investigates whether machine learning can effectively simulate expert-panel judgments of creativity using minimal training data. Using a dataset of 411 short stories, we compared the performance of Random Forest (RF), Gradient Boosted Trees, and Decision Tree models, based on story length and Divergent Semantic Integration, to predict expert CAT ratings by (1) identifying the optimal algorithm and (2) the minimum training sample size required for reliable prediction. Results indicate that RF consistently outperformed other algorithms, achieving high correlations with CAT scores (r = 0.80) using as few as 25 training stories. Furthermore, RF demonstrated superior accuracy and lower reliance on story length compared to LLM-based scoring models. These findings provide a robust proof-of-concept for using simulated expert panels as a scalable alternative to (decontextualized) automated assessment methods, while reducing human raters’ burden and the logistical constraints of complex rating designs. Extension of this work to different contexts, creativity tasks and domains are necessary to gauge its generalizability.
Building similarity graph...
Analyzing shared references across papers
Loading...
Baptiste Barbot
Yale University
Thomas Calogero Kiekens
UCLouvain
Behavioral Sciences
Yale University
UCLouvain
Building similarity graph...
Analyzing shared references across papers
Loading...
Barbot et al. (Sat,) studied this question.
synapsesocial.com/papers/69df2c9ee4eeef8a2a6b1cad — DOI: https://doi.org/10.3390/bs16040576
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: