Pre-trained models for natural language processing: A survey

  • PDF / 380,561 Bytes
  • 26 Pages / 612 x 792 pts (letter) Page_size
  • 7 Downloads / 213 Views

DOWNLOAD

REPORT


Print-CrossMark

https://doi.org/10.1007/s11431-020-1647-3

Special Topic: Natural Language Processing Technology

. Review .

Pre-trained models for natural language processing: A survey QIU XiPeng1,2* , SUN TianXiang1,2, XU YiGe1,2, SHAO YunFan1,2, DAI Ning1,2 & HUANG XuanJing1,2 1 School 2 Shanghai

of Computer Science, Fudan University, Shanghai 200433, China; Key Laboratory of Intelligent Information Processing, Shanghai 200433, China

Received March 9, 2020; accepted May 21, 2020; published online September 15, 2020

Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy from four different perspectives. Next, we describe how to adapt the knowledge of PTMs to downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks. deep learning, neural network, natural language processing, pre-trained model, distributed representation, word embedding, self-supervised learning, language modelling Citation:

Qiu X P, Sun T X, Xu Y G, et al. Pre-trained models for natural language processing: A survey. https://doi.org/10.1007/s11431-020-1647-3

1 Introduction With the development of deep learning, various neural networks have been widely used to solve natural language processing (NLP) tasks, such as convolutional neural networks (CNNs) [1–3], recurrent neural networks (RNNs) [4, 5], graph-based neural networks (GNNs) [6–8] and attention mechanisms [9, 10]. One of the advantages of these neural models is their ability to alleviate thefeature engineering problem. Non-neural NLP methods usually heavily rely on the discrete handcrafted features, while neural methods usually use low-dimensional and dense vectors (aka.distributed representation) to implicitly represent the syntactic or semantic features of the language. These representations are learned in specific NLP tasks. Therefore, neural methods make it easy for people to develop various NLP systems. Despite the success of neural models for NLP tasks, the performance improvement may be less significant compared

Sci China Tech Sci, 2020, 63,

with the computer vision (CV) field. The main reason is that current datasets for most supervised NLP tasks are rather small (except machine translation). Deep neural networks usually have a large number of parameters, which make them overfit on these small training data and do not generalize well in practice. Therefore, the early neural models for many NLP tasks were relatively shallow and usually consisted of only 1–3 neural layers. Recently, substantial work has shown that pre-trained models (PTMs1) ), on the large corpus can learn universal language representations, which are beneficial for downstream NLP tasks a