May 1, 2024

A Comparison of Numeric Assessments of Ideas From Two Large Language Models: With Implications for Validating and Choosing LLMs

Key Points

Key points are not available for this paper at this time.

Abstract

This article compares numeric assessments generated by ChatGPT and Claude along four dimensions of novelty, feasibility, impact, and disruption, to study their ability to rate ideas. We find that those chatbots make numeric assessments that are consistent with the expected relationships between those dimensions, for example, novelty is negatively correlated with feasibility. We also find that the two chatbots make statistically significantly different numeric assessments of the same idea information. We suggest that this type of analysis can also be used to provide a type of validation of underlying chatbot capabilities. In addition, we suggest that, as part of their chatbot requirements analysis, enterprises use this approach to ensure that the chatbot appropriately "understands" concepts, in which they are directly interested.

AI에게 질문

Bookmark

Cite This Study

Daniel E. O’Leary (Wed,) studied this question.

synapsesocial.com/papers/68e6c1d6b6db64358764131b https://doi.org/https://doi.org/10.1109/mis.2024.3396371