What type of study is this?

This is a Quantitative Study study.

October 3, 2025Open Access

Gender Bias in Computer-generated Thesauri: The Case of the Serbian Section of Kontekst.io, a Thesaurus of Synonyms and Semantically Related Terms

Puntos clave

Gender bias is prevalent in four selected entries of the thesaurus, indicating systemic issues in language processing.
The analysis reveals deeper gender biases than previously identified in traditional dictionaries and NLP studies.
The study utilizes semantic analysis to group biased terms into various fields, enhancing understanding of their usage.
Recommendations are provided to improve the lexicographic quality of the thesaurus based on the findings.

Resumen

This paper studies gender bias in the computer-generated thesaurus Kontext.io, which is a search portal of synonyms and semantically related terms in Serbian, Croatian and Slovenian. Its Serbian section, which is the focus here, is based on a natural language processing (NLP) technique called word embeddings and a large internet corpus of Serbian. Gender bias is uncovered in four selected entries of this thesaurus: žena (woman), muškarac (man), d(j)evojka (young woman) and momak (young man). The analysis is first conducted semantically and the terms found are grouped into various semantic fields. After that, in the vein of the earlier studies of gender bias in traditional dictionaries and critical discourse analysis, an analysis of gender bias in the selected entries is provided. The results show that gender bias is ubiquitous and that it extends deeper than the earlier studies of gender bias in word embeddings have shown. We then give recommendations for improving this lexicographic product based on the results. Keywords: gender bias, computer-generated thesaurus, word embeddings, Kontekst.io, Serbian, lexicography

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo