Key points are not available for this paper at this time.
People are increasingly using Large Language Models (LLMs) for a sense of “localness,” yet their ability to accurately and equitably represent local knowledge remains unexamined. To investigate this, we conducted a large-scale evaluation using a benchmark of over 12,000 question-answer pairs spanning structured census data, local news, and social media. Our results show that performance is strongly shaped by data modality: structured tasks expose deep limitations in numerical reasoning and calibration, while open-ended prompts reveal a clear performance hierarchy favoring informal user-generated content over professionally edited prose. Our primary finding is the existence of deep, context-dependent disparities that affect communities differently. We uncover a dual geographic bias: in formal news contexts, models exhibit a strong “urban advantage,” leaving rural areas systematically underrepresented with lower semantic depth. Conversely, in social media data, models suffer an “urban penalty,” struggling to navigate the conversational complexity and slang of high-density areas. This indicates that while rural locales face a “poverty of data,” highly documented urban centers face a “poverty of precision.” We also identify a domain bias: models are more adept at handling concrete, physical questions but consistently struggle to capture the nuanced relational and cognitive dimensions of a community. This work provides the first systematic audit of localness disparities in LLMs, revealing how they reflect and risk amplifying real-world inequities. Achieving equitable local representation requires moving beyond passive evaluation to active intervention. We call for a concerted effort from the CSCW community to build richer and more ethical datasets, design interfaces that prioritize user verification over blind trust, and architect AI systems for deeper and more just engagement with place.
Gao et al. (Thu,) studied this question.