This paper studies gender bias in the computer-generated thesaurus Kontext.io, which is a search portal of synonyms and semantically related terms in Serbian, Croatian and Slovenian. Its Serbian section, which is the focus here, is based on a natural language processing (NLP) technique called word embeddings and a large internet corpus of Serbian. Gender bias is uncovered in four selected entries of this thesaurus: žena (woman), muškarac (man), d(j)evojka (young woman) and momak (young man). The analysis is first conducted semantically and the terms found are grouped into various semantic fields. After that, in the vein of the earlier studies of gender bias in traditional dictionaries and critical discourse analysis, an analysis of gender bias in the selected entries is provided. The results show that gender bias is ubiquitous and that it extends deeper than the earlier studies of gender bias in word embeddings have shown. We then give recommendations for improving this lexicographic product based on the results. Keywords: gender bias, computer-generated thesaurus, word embeddings, Kontekst.io, Serbian, lexicography
Čarapić et al. (Wed,) studied this question.