Abstract: Linguatec Tolosa Treebank for Occitan Linguatec Tolosa Treebank is the first dependency treebank for Occitan, developed as part of the EFA 227/16 LINGUATEC Project, financed by the POCTEFA Interreg European funds. The current version of the treebank contains 13K tokens annotated for PoS tags, lemmas and syntactic dependencies. Linguistic annotation follows Universal Dependencies guidelines (https://universaldependencies.org/#language-u). A detailed corpus description is provided in the description file. A subset of texts was doubly annotated an...
(read more)
Topics: 
Natural language processing