Deep Learning in Lexical Analysis and Parsing

Lexical analysis and parsing tasks model the deeper properties of the words and their relationships to each other. The commonly used techniques involve word segmentation, part-of-speech tagging and parsing. A typical characteristic of such tasks is that t

  • PDF / 949,793 Bytes
  • 38 Pages / 439.37 x 666.142 pts Page_size
  • 12 Downloads / 261 Views

DOWNLOAD

REPORT


Deep Learning in Lexical Analysis and Parsing Wanxiang Che and Yue Zhang

Abstract Lexical analysis and parsing tasks model the deeper properties of the words and their relationships to each other. The commonly used techniques involve word segmentation, part-of-speech tagging and parsing. A typical characteristic of such tasks is that the outputs are structured. Two types of methods are usually used to solve these structured prediction tasks: graph-based methods and transitionbased methods. Graph-based methods differentiate output structures based on their characteristics directly, while transition-based methods transform output construction processes into state transition processes, differentiating sequences of transition actions. Neural network models have been successfully used for both graph-based and transition-based structured prediction. In this chapter, we give a review of applying deep learning in lexical analysis and parsing, and compare with traditional statistical methods.

4.1 Background The properties of a word include its syntactic word categories (also known as part of speech, POS), morphologies, and so on (Manning and Schütze 1999). Obtaining these information is also known as lexical analysis. For languages like Chinese, Japanese, and Korean that do not separate words with whitespace, lexical analysis also includes the task of word segmentation, i.e., splitting a sequence of characters into words. Even in English, although whitespace is a strong clue for word boundaries, it is neither necessary nor sufficient. For example, in some situations, we might wish to treat New York as a single word. This is regarded as a named entity recognition (NER) problem (Shaalan 2014). On the other hand, punctuation marks are always adjacent to words. We also need to judge whether to segment them or not. W. Che (B) Harbin Institute of Technology, Harbin, Heilongjiang, China e-mail: [email protected] Y. Zhang Singapore University of Technology and Design, Singapore, Singapore e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2018 L. Deng and Y. Liu (eds.), Deep Learning in Natural Language Processing, https://doi.org/10.1007/978-981-10-5209-5_4

79

80

W. Che and Y. Zhang

For languages like English, this is often called tokenization which is more a matter of convention than a serious research problem. Once we know some properties of words, we may be interested in the relationships between them. The parsing task is to find and label words (or sequences of words) that are related to each other compositionally or recursively (Jurafsky and Martin 2009). There are two commonly used parses: phrase-structure (or constituency) parsing and dependency parsing. All of these tasks can be regarded as structured prediction problems which is a term for supervised machine learning, i.e., the outputs are structured and influenced each other. Traditionally, huge amounts of human-designed handcrafted features are fed into a linear classifier to predict a score for each decision unit and then combine all of these s