Large language models (LLMs) are progressively used within decision-support systems which have been shown to affect access and opportunities in housing, lending, policing, and almost all public services. One critical question that remains unanswered and largely unexplored is whether LLMs encode social bias about place and community (this in turn may result in the reinforcement of historical inequity at scale). In this paper, we present LinguisticRedline, the first systematic empirical study of racial and socio-economic bias across LLM-generated crime risk assessments of urban neighborhoods. We constructed a controlled dataset of 2,000 unique descriptions of actual U.S. census tracts based on demo-graphic data from the American Community Survey (ACS) 2022 and amenity features from OpenStreetMap covering 10 of the largest cities in the U.S. Each of the descriptions was input into Llama 3.1 8B via the Groq API to obtain both numerical crime risk scores (from 1 to 10) and qualitative crime risk evaluations. The two major findings from this analysis are: (1) LLMs assigned crime risk scores averaged four points higher to Black neighborhoods than to identically described White neighborhoods at high-income levels, which constitutes direct experimental evidence of a racially biased social perception; and (2) for low-income levels, LLMs displayeda uniform ”urban penalty” across all urban neighborhoods regardless of racial makeup (i.e., nearly all urban neighborhoods received scores close to the top of the scoring scale). The existence of these two findings may indicate an income moderated racial bias. We quantify bias using ANOVA, linear regression, disparate impact ratios, and demographic parity gap analysis, and release our full pipeline as open-source for community use.
Praveena Simhadri (Mon,) studied this question.