Abstract: This article presents preliminary findings from a multi-year, multi-disciplinary text analysis project using an ancient and medieval Chinese corpus of over five million characters in works that date from the earliest received texts to the Song dynasty. It describes “distant reading” methods in the humanities and the authors’ corpus; introduces topic-modeling procedures; answers questions about the authors’ data; discusses complementary relationships between machine learning and human expertise; explains topics represented inAnalects, Me...
(read more)
Topics: 
Linguistics
Natural language processing
Artificial intelligence