Abstract: Tolosa Treebank for Occitan Tolosa Treebank is the first dependency treebank for Occitan, developed as part of the EFA 227/16 LINGUATEC Project, financed by the POCTEFA Interreg European funds. The current version of the treebank contains 25K tokens annotated for PoS tags, lemmas and syntactic dependencies. Linguistic annotation follows Universal Dependencies guidelines (https://universaldependencies.org/#language-u). The corpus files are stored in the ConLL-U format. Each sentence is preceded by a sentence ID and the original, non-tokenized te...
(read more)