Abstract: When developing NLP tools for low-resource languages, one is often confronted with the lack of annotated data. We propose to circumvent this bottleneck by training a supervised HMM tagger on a closely related language for which annotated data are available, and translating the words in the tagger parameter files into the low-resource language. The translation dictionaries are created with unsupervised lexicon induction techniques that rely only on raw textual data. We obtain a tagging accuracy of up to 89.08% using a Spanish tagger adapted to C...
(read more)
Topics: 
Natural language processing
Artificial intelligence
Speech recognition