Pre-trained models for natural language processing: A survey

PDF / 380,561 Bytes
26 Pages / 612 x 792 pts (letter) Page_size
7 Downloads / 242 Views

Print-CrossMark

https://doi.org/10.1007/s11431-020-1647-3

Special Topic: Natural Language Processing Technology

. Review .

Pre-trained models for natural language processing: A survey QIU XiPeng1,2* , SUN TianXiang1,2, XU YiGe1,2, SHAO YunFan1,2, DAI Ning1,2 & HUANG XuanJing1,2 1 School 2 Shanghai

of Computer Science, Fudan University, Shanghai 200433, China; Key Laboratory of Intelligent Information Processing, Shanghai 200433, China

Received March 9, 2020; accepted May 21, 2020; published online September 15, 2020

Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy from four diﬀerent perspectives. Next, we describe how to adapt the knowledge of PTMs to downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks. deep learning, neural network, natural language processing, pre-trained model, distributed representation, word embedding, self-supervised learning, language modelling Citation:

Qiu X P, Sun T X, Xu Y G, et al. Pre-trained models for natural language processing: A survey. https://doi.org/10.1007/s11431-020-1647-3

1 Introduction With the development of deep learning, various neural networks have been widely used to solve natural language processing (NLP) tasks, such as convolutional neural networks (CNNs) [1–3], recurrent neural networks (RNNs) [4, 5], graph-based neural networks (GNNs) [6–8] and attention mechanisms [9, 10]. One of the advantages of these neural models is their ability to alleviate thefeature engineering problem. Non-neural NLP methods usually heavily rely on the discrete handcrafted features, while neural methods usually use low-dimensional and dense vectors (aka.distributed representation) to implicitly represent the syntactic or semantic features of the language. These representations are learned in specific NLP tasks. Therefore, neural methods make it easy for people to develop various NLP systems. Despite the success of neural models for NLP tasks, the performance improvement may be less significant compared

Sci China Tech Sci, 2020, 63,

with the computer vision (CV) field. The main reason is that current datasets for most supervised NLP tasks are rather small (except machine translation). Deep neural networks usually have a large number of parameters, which make them overfit on these small training data and do not generalize well in practice. Therefore, the early neural models for many NLP tasks were relatively shallow and usually consisted of only 1–3 neural layers. Recently, substantial work has shown that pre-trained models (PTMs1) ), on the large corpus can learn universal language representations, which are beneficial for downstream NLP tasks a

Data Loading...

Pre-trained models for natural language processing: A survey

Recommend Documents

Semantics-Oriented Natural Language Processing Mathematical Models a

Natural Language Processing for Biosurveillance

Ensemble Distilling Pretrained Language Models for Machine Translation Quality Estimation

Natural Language Processing

Natural language processing bots

Natural Language Processing

Representation Learning for Natural Language Processing

RuThes Thesaurus for Natural Language Processing

Thai Natural Language Processing Programming

Empirical Laws of Natural Language Processing for Hindi Language

A Survey of Distinctive Prominence of Automatic Text Summarization Techniques Using Natural Language Processing

BERTimbau: Pretrained BERT Models for Brazilian Portuguese