BIP! Finder - TinyBERT: Distilling BERT for Natural Language Understanding

2020 • TinyBERT: Distilling BERT for Natural Language Understanding

Authors: Jiao, Xiaoqi, Yin, Yichun, Shang, Lifeng, Jiang, Xin, Chen, Xiao, Li, Linlin, Wang, Fang, Liu, Qun

Venue: Findings of the Association for Computational Linguistics: EMNLP 2020

Type: Publication

Abstract: Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently execute them on resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models. By leveraging this new KD method, the plen... (read more)

Topics: Artificial intelligence Machine learning Natural language processing

DOI: 10.18653/v1/2020.findings-emnlp.372 (Found 2 versions)

BIP! social metrics: 0 1
External links: Crossref OpenAIRE

BibTex PDF

Topic-specific impact indicators

Popularity: This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
Influence: This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
Citation Count: This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
Impulse: This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.