Recent advances of neural text generation: Core tasks, datasets, models and challenges

  • PDF / 303,103 Bytes
  • 21 Pages / 612 x 792 pts (letter) Page_size
  • 35 Downloads / 148 Views

DOWNLOAD

REPORT


Print-CrossMark

https://doi.org/10.1007/s11431-020-1622-y

Special Topic: Natural Language Processing Technology

. Review .

Recent advances of neural text generation: Core tasks, datasets, models and challenges JIN HanQi1,2,3, CAO Yue1,2,3, WANG TianMing1,3, XING XinYu1,3 & WAN XiaoJun1,2,3* 1 Wangxuan

Institute of Computer Technology, Peking University, Beijing 100871, China; for Data Science, Peking University, Beijing 100871, China; 3 The MOE Key Laboratory of Computational Linguistics, Peking University, Beijing 100871, China 2 Center

Received March 9, 2020; accepted May 6, 2020; published online September 15, 2020

In recent years, deep neural network has achieved great success in solving many natural language processing tasks. Particularly, substantial progress has been made on neural text generation, which takes the linguistic and non-linguistic input, and generates natural language text. This survey aims to provide an up-to-date synthesis of core tasks in neural text generation and the architectures adopted to handle these tasks, and draw attention to the challenges in neural text generation. We first outline the mainstream neural text generation frameworks, and then introduce datasets, advanced models and challenges of four core text generation tasks in detail, including AMR-to-text generation, data-to-text generation, and two text-to-text generation tasks (i.e., text summarization and paraphrase generation). Finally, we present future research directions for neural text generation. This survey can be used as a guide and reference for researchers and practitioners in this area. natural language generation, neural text generation, AMR-to-text, data-to-text, text summarization, paraphrase generation Citation:

Jin H Q, Cao Y, Wang T M, et al. Recent advances of neural text generation: Core tasks, datasets, models and challenges. Sci China Tech Sci, 2020, 63, https://doi.org/10.1007/s11431-020-1622-y

1 Introduction Natural language generation (NLG) or text generation is a core subarea of natural language processing and artificial intelligence, and it has drawn much more attention in recent years due to the emerging and wide application requirements. Text generation techniques and systems can be used in various businesses and industries, including media, publishing, education, advertising, and e-commerce. For example, AI reporters based on text generation techniques have been successfully developed and used to automatically write and publish massive sports and financial news articles. Text generation systems usually take linguistic or nonlinguistic inputs and the text generation tasks can be cate-

gorized based on the type of inputs: meaning-to-text generation [1–3] takes concept or meaning representations as inputs and a popular meaning representation is abstract meaning representation (AMR) [4]; data-to-text generation [5–7] takes structured data records or tables as input; text-to-text generation [8–13] takes natural language texts or sentences as inputs, and the typical tasks include text summarizat