Key points are not available for this paper at this time.
Large Language Models (LLMs) are now embedded in tools used by millions of people, yet they continue to reproduce harmful socio-demographic biases. Most research treats bias as a training data problem, but it is also shaped at the point of interaction — by how prompts are worded and how responses are generated. This paper investigates both dimensions. We constructed a structured dataset of approximately 5,000 prompts across five socio-demographic categories (race, religion, gender, age, and profession), evaluated responses from two open-source models (GPT-2 and Qwen-3B) under single-output (k=1) and multi-output (k=3) decoding conditions, and measured outputs using the Detoxify toolkit. Our most striking finding is that generating more responses (k=3) made bias worse - in GPT-2, race-related bias scores increased tenfold. We demonstrate three lightweight mitigation strategies that reduce mean bias by 26% without model retraining. Dataset and pipeline openly available on Hugging Face.
Building similarity graph...
Analyzing shared references across papers
Loading...
Lipi Chandrakar (Thu,) studied this question.
www.synapsesocial.com/papers/6a06b971e7dec685947ac1b1 — DOI: https://doi.org/10.5281/zenodo.20169051
Lipi Chandrakar
University of Hertfordshire
Building similarity graph...
Analyzing shared references across papers
Loading...