A deep learning based method for extracting semantic information from patent documents

  • PDF / 1,714,483 Bytes
  • 24 Pages / 439.37 x 666.142 pts Page_size
  • 80 Downloads / 184 Views

DOWNLOAD

REPORT


A deep learning based method for extracting semantic information from patent documents Liang Chen1 · Shuo Xu2   · Lijun Zhu1 · Jing Zhang1 · Xiaoping Lei1 · Guancan Yang3 Received: 2 January 2020 © Akadémiai Kiadó, Budapest, Hungary 2020

Abstract The text-based patent analysis is grounded in information extraction technique. However, such technique suffers from obvious defects such as low degree of automation and unsatisfactory extraction accuracy. To deal with these problems, after an information schema is pre-defined, which contains 17 types of entities and 15 types of semantic relations, a dataset of 1010 patent abstracts is annotated and opened freely to the research community. Then, a novel patent information extraction framework is proposed, in which two deeplearning models, BiLSTM-CRF and BiGRU-HAN, are respectively used for entity identification and semantic relation extraction. Finally, to demonstrate the advantages of the new framework, extensive experiments are conducted, and the SAO method and PCNNs model are taken as respective baselines on the framework and module levels. Experimental results show that our framework out-performs the traditional one in terms of automation and accuracy, and is capable of extracting fine-grained structured information from patent texts. Keywords  Patent analysis · Entity identification · Relation extraction · Deep learning · BiGRU-HAN · BiLSTM-CRF · Thin film head · SAO · PCNNs * Shuo Xu [email protected] Liang Chen [email protected] Lijun Zhu [email protected] Jing Zhang [email protected] Xiaoping Lei [email protected] Guancan Yang [email protected] 1

Institute of Scientific and Technical Information of China, Beijing 100038, People’s Republic of China

2

Research Base of Beijing Modern Manufacturing Development, College of Economics and Management, Beijing University of Technology, Beijing 100124, People’s Republic of China

3

School of Information Resource Management, Renmin University of China, Beijing 100872, People’s Republic of China



13

Vol.:(0123456789)

Scientometrics

Introduction Patent document is a type of important intellectual resource, from which valuable technical intelligence can be obtained for technology opportunity discovery (Lee and Lee 2019), invention protection (Park et al. 2012), technology trend analysis (Han et al. 2017) and so on. As a matter of fact, so far technical intelligence is mainly obtained by expert reading (Yang 2012; Zhang 2016), which is laborious and inefficient, especially when the number of patents has been increasing dramatically due to rapid development in various technology areas in recent years. The automatic reading comprehension (Chen 2018) on the patents for technical intelligence becomes a significant challenge for the entire patent system. Information extraction, armed with some powerful machine learning method, is one of the fundamental building blocks for computers to understand natural language, since it is capable of solving the ambiguous problem inherent in free texts by converting texts into semanti