2021 •
Slovene and Croatian word embeddings in terms of gender occupational analogies
Authors:
Matej Ulčar, Anka Supej, Marko Robnik-Šikonja, Senja Pollak
Abstract:In recent years, the use of deep neural networks and dense vector embeddings for text representation have led to excellent results in the field of computational understanding of natural language. It has also been shown that word embeddings often capture gender, racial and other types of bias. The article focuses on evaluating Slovene and Croatian word embeddings in terms of gender bias using word analogy calculations. We compiled a list of masculine and feminine nouns for occupations in Slovene and evaluated the gender bias of fastText, (...) In recent years, the use of deep neural networks and dense vector embeddings for text representation have led to excellent results in the field of computational understanding of natural language. It has also been shown that word embeddings often capture gender, racial and other types of bias. The article focuses on evaluating Slovene and Croatian word embeddings in terms of gender bias using word analogy calculations. We compiled a list of masculine and feminine nouns for occupations in Slovene and evaluated the gender bias of fastText, word2vec and ELMo embeddings with different configurations and different approaches to analogy calculations. The lowest occupational gender bias was observed with the fastText embeddings. Similarly, we compared different fastText embeddings on Croatian occupational analogies. (Read More)
Matej Ulčar, Anka Supej, Marko Robnik-Šikonja, Senja Pollak
Slovenščina 2.0: empirical, applied and interdisciplinary research ·
2021
Natural language processing |
Artificial intelligence |
Linguistics |
We have placed cookies on your device to help make this website and the services we offer better. By using this site, you agree to the use of cookies. Learn more