2021 •
A benchmark of Spanish language datasets for computationally driven research
Authors:
Gustavo Candela, María-Dolores Sáez, Pilar Escobar, Manuel Marco-Such
Abstract:
In the domain of Galleries, Libraries, Archives and Museums (GLAM) institutions, creative and innovative tools and methodologies for content delivery and user engagement have recently gained international attention. New methods have been proposed to publish digital collections as datasets amenable to computational use. Standardised benchmarks can be useful to broaden the scope of machine-actionable collections and to promote cultural and linguistic diversity. In this article, we propose a methodology to select datasets for computationally drive (...)
In the domain of Galleries, Libraries, Archives and Museums (GLAM) institutions, creative and innovative tools and methodologies for content delivery and user engagement have recently gained international attention. New methods have been proposed to publish digital collections as datasets amenable to computational use. Standardised benchmarks can be useful to broaden the scope of machine-actionable collections and to promote cultural and linguistic diversity. In this article, we propose a methodology to select datasets for computationally driven research applied to Spanish text corpora. This work seeks to encourage Spanish and Latin American institutions to publish machine-actionable collections based on best practices and avoiding common mistakes. This research has been funded by the AETHER-UA (PID2020-112540RB-C43) Project from the Spanish Ministry of Science and Innovation. (Read More)
Gustavo Candela, María-Dolores Sáez, Pilar Escobar, Manuel Marco-Such
Journal of Information Science ·
2021
Data science |
Information retrieval |
World Wide Web |
We have placed cookies on your device to help make this website and the services we offer better. By using this site, you agree to the use of cookies. Learn more