Abstract: The problem of discriminating between similar languages and dialects is one of the current challenges of natural language processing. In this paper, we describe the collection of a bidialectal corpus of Greek and the construction of a classifier to distinguish between Cypriot Greek (CG) and Standard Modern Greek (SMG). The corpus of CG and SMG was compiled from social media websites such as Facebook, Twitter and online forums. N-gram features were extracted and three classification algorithms were applied and tested on labeled sentences: multin...
(read more)
Topics: 
Artificial intelligence
Natural language processing
Machine learning