May 14, 2024Open Access

Evaluating Stereotypical Biases and Implications for Fairness in Large Language Models

Key Points

Key points are not available for this paper at this time.

Abstract

In this study, we investigate the types of stereotypical bias in Large Language Models (LLMs).We highlight the risks of ignoring bias in LLMs, ranging from perpetuating stereotypes to affecting hiring decisions, medical diagnostics, and criminal justice outcomes.To address these issues, we propose a novel approach to evaluate bias in LLMs using metrics developed by Stereoset 1.Our experiments involve evaluating several proprietary and open-source LLMs (GPT4, GEMINI PRO, OPENCHAT, LLAMA) for stereotypical bias and examining the attributes that influence bias.We used a selected 100 prompts from the stereoset dataset to query the LLMs via their respective APIs.The results were evaluated using the language modeling score, stereotype score and the combination iCAT1 score.In particular, open source LLMs showed higher levels of bias in handling stereotypes than proprietary LLMs (40% average stereotype score for the open source LLMs and 47% average stereotype score for the proprietary ones: 50% being the ideal, unbiased stereotype score).The language modeling score was even between the models, with the open source models achieving 94% and the proprietary ones 91%.The combined average iCAT score was 76.6% for the proprietary models and 62.5% for the open source models.This disparity in stereotypical bias could be due to the regulatory inspection and user testing through reinforcement learning with human feedback (RLHF) that the proprietary models are subject to.We present our findings and discuss their implications for mitigating bias in LLMs.Overall, this research contributes to the understanding of bias in LLMs and provides insights into strategies for improving fairness and equity in NLP applications.

Read Full Paperexternally

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper