BIP! Finder - A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages

2020 • A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages

Authors: Suárez, Pedro Javier Ortiz, Romary, Laurent, Sagot, Benoît

Venue: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Type: Publication

Abstract: International audience; We use the multilingual OSCAR corpus, extracted from Common Crawl via language classification, filtering and cleaning, to train monolingual contextualized word embeddings (ELMo) for several mid-resource languages. We then compare the performance of OSCAR-based and Wikipedia-based ELMo embeddings for these languages on the part-of-speech tagging and parsing tasks. We show that, despite the noise in the Common-Crawl-based OSCAR data, embeddings trained on OSCAR perform much better than monolingual embeddings trained on Wik... (read more)

Topics: Natural language processing Artificial intelligence

DOI: 10.18653/v1/2020.acl-main.156 (Found 2 versions)

BIP! social metrics: 0 1
External links: Crossref OpenAIRE

BibTex PDF

Topic-specific impact indicators

Popularity: This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
Influence: This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
Citation Count: This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
Impulse: This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.