Pulse Journal Club Trending Explore Questions Researchers

Download the App

Join discussions, follow papers, and never miss your next session.

© Synapse Social LLC, 2026Privacy Policy

Home Explore Journal Club Trending

⌘+K

Can we trust the evaluation on ChatGPT? | Synapse

January 1, 2023Open Access

Can we trust the evaluation on ChatGPT?

Key Points

Key points are not available for this paper at this time.

Abstract

ChatGPT, the first large language model with mass adoption, has demonstrated remarkableperformance in numerous natural language tasks. Despite its evident usefulness, evaluatingChatGPT's performance in diverse problem domains remains challenging due to the closednature of the model and its continuous updates via Reinforcement Learning from HumanFeedback (RLHF). We highlight the issue of data contamination in ChatGPT evaluations, with a case study in stance detection. We discuss the challenge of preventing data contamination and ensuring fair model evaluation in the age of closed and continuously trained models.

Mark Helpful

Bookmark

Relay

View Full Paper

Mark Helpful

Bookmark

Relay

View Full Paper

Cite This Study

Aiyappa et al. (Sun,) studied this question.

synapsesocial.com/papers/6a0a2aeaa9b588564434cf99 https://doi.org/https://doi.org/10.18653/v1/2023.trustnlp-1.5

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

1Aion Framework: Dimensional Emergence of AI Consciousness, Observer-Induced Collapse, and Cosmological Portal Dynamics2023 · 14,225 citations
2ChatGPT: five priorities for research2023 · 1,789 citations
3Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models2022 · 617 citations
4Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem1989 · 3,711 citations
5Mathematical Capabilities of ChatGPT2023 · 298 citations