March 3, 2026Open Access

“Missing standardization”: Identifying harmful language ideologies in natural language processing work

Key Points

The findings reveal a rapid change in awareness of harmful language ideologies within natural language processing research, particularly regarding Indigenous languages.
In recent years, published scholarship has started to address biases linked to language ideologies, focusing on improving linguistic datasets.
Systematic search and content analysis of 20 years of literature provide insights into current practices and emerging frameworks for linguistic justice.
Integrating ethical protocols into language technology development may enhance equity and reduce harm to marginalized communities.

Abstract

Current practices in the field of natural language processing may reinforce stereotypes, stigmatize non-normative speech, and prevent access to public discourse online. Across Latin America, a rapid increase in digital material available in minoritized (mostly Indigenous) languages spoken in the region has been observed, and a number of Indigenous language corpora and language models are currently under development. Given that “poor data quality in critical areas can disproportionately impact vulnerable communities and situations it is important to examine the norms and assumptions embedded within the process for building linguistic datasets. Operationalizing and measuring harms have been the primary focus of work investigating bias in natural language processing, and linguistic justice has recently been proposed as a framework for identifying harmful language ideologies in natural language processing systems. This article explores whether and how harmful ideologies of language may be informing the work of natural language processing researchers working on minoritized Mexican languages, through a systematic search and content analysis of published scholarship on natural language processing covering a 20-year period. The findings show that the field is changing rapidly, with far greater awareness of potentially harmful language ideologies in recent years, and attempts to mitigate associated bias. This work also shows that the concepts of linguistic justice and language ideology provide a fruitful framework for understanding, and potentially guiding, the further integration of ethical protocols into the construction of language technologies.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Melissa Gasparotto (Wed,) studied this question.

synapsesocial.com/papers/69a75d14c6e9836116a26893 https://doi.org/https://doi.org/10.1177/20539517251406184

Bookmark

View Full Paper