Abstract Large language models (LLMs), particularly Generative Pre-trained Transformer (GPT)-based systems, introduce ethical issues across model development, user interaction, and domain-specific use. Through a multi-level thematic synthesis of the academic literature, we examine ethical issues in GPT-based LLM systems at three levels: (a) the foundation model level, (b) the user-facing artifact level, with a focus on ChatGPT, and (c) the domain-specific application level, covering education, healthcare, and business. The synthesis reveals three ethical issues common to the foundation model and user-facing artifact levels: bias and fairness, privacy and security, and hallucinations and content integrity. At the user-facing artifact level, phishing, trust, and transparency emerge as additional ethical issues. At the domain-specific application level, the reviewed literature highlights issues related to academic integrity and learning in education, patient safety and confidentiality in healthcare, and organizational decision-making, confidentiality, and overreliance in business. Our findings show how ethical issues can originate at one level and become amplified, transformed, or newly instantiated at another. We also identify an architectural manifestation of the Collingridge dilemma: ethical issues are difficult to anticipate and control at the foundation model level, yet become harder to address once GPT-based LLM systems are embedded in user-facing artifacts and domain-specific applications. Based on the synthesis, we propose five future research directions: (1) organizational deployment, management, and governance of GPT-based LLM applications, (2) user skills and training, (3) multi-actor coordination in the LLM governance ecosystem, (4) transmission of ethical issues across foundation model, artifact, and application levels, and (5) trust and expertise boundaries in domain-specific applications. The study contributes a structured framework for analyzing ethical issues in GPT-based LLM systems and clarifies how these issues emerge across different architectural levels.
Pervez et al. (Tue,) studied this question.