July 17, 2023Open Access

Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators

Key Points

Key points are not available for this paper at this time.

Abstract

Large language models that exhibit instruction-following behaviour represent one of the biggest recent upheavals in conversational interfaces, a trend in large part fuelled by the release of OpenAI’s ChatGPT, a proprietary large language model for text generation fine-tuned through reinforcement learning from human feedback (LLM+RLHF). We review the risks of relying on proprietary software and survey the first crop of open-source projects of comparable architecture and functionality. The main contribution of this paper is to show that openness is differentiated, and to offer scientific documentation of degrees of openness in this fast-moving field. We evaluate projects in terms of openness of code, training data, model weights, RLHF data, licensing, scientific documentation, and access methods. We find that while there is a fast-growing list of projects billing themselves as ‘open source’, many inherit undocumented data of dubious legality, few share the all-important instruction-tuning (a key site where human annotation labour is involved), and careful scientific documentation is exceedingly rare. Degrees of openness are relevant to fairness and accountability at all points, from data collection and curation to model architecture, and from training and fine-tuning to release and deployment.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Andreas Liesenfeld

IE University

Alianda Lopez

IE University

Mark Dingemanse

Radboud University Nijmegen

Actions

Institutions

Radboud University Nijmegen

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

A systematic evaluation of large language models of code· 2022 · 496 citations
Highly accurate protein structure prediction for the human proteome· 2021 · 3,199 citations
Augmented Language Models: a Survey· 2023 · 141 citations
How open science helps researchers succeed· 2016 · 818 citations
LLaMA: Open and Efficient Foundation Language Models· 2023 · 3,895 citations

Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider