Key points are not available for this paper at this time.
This article compares numeric assessments generated by ChatGPT and Claude along four dimensions of novelty, feasibility, impact, and disruption, to study their ability to rate ideas. We find that those chatbots make numeric assessments that are consistent with the expected relationships between those dimensions, for example, novelty is negatively correlated with feasibility. We also find that the two chatbots make statistically significantly different numeric assessments of the same idea information. We suggest that this type of analysis can also be used to provide a type of validation of underlying chatbot capabilities. In addition, we suggest that, as part of their chatbot requirements analysis, enterprises use this approach to ensure that the chatbot appropriately "understands" concepts, in which they are directly interested.
Daniel E. O’Leary (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: