A deep learning based method for extracting semantic information from patent documents

PDF / 1,714,483 Bytes
24 Pages / 439.37 x 666.142 pts Page_size
80 Downloads / 321 Views

A deep learning based method for extracting semantic information from patent documents Liang Chen1 · Shuo Xu2 · Lijun Zhu1 · Jing Zhang1 · Xiaoping Lei1 · Guancan Yang3 Received: 2 January 2020 © Akadémiai Kiadó, Budapest, Hungary 2020

Abstract The text-based patent analysis is grounded in information extraction technique. However, such technique suffers from obvious defects such as low degree of automation and unsatisfactory extraction accuracy. To deal with these problems, after an information schema is pre-defined, which contains 17 types of entities and 15 types of semantic relations, a dataset of 1010 patent abstracts is annotated and opened freely to the research community. Then, a novel patent information extraction framework is proposed, in which two deeplearning models, BiLSTM-CRF and BiGRU-HAN, are respectively used for entity identification and semantic relation extraction. Finally, to demonstrate the advantages of the new framework, extensive experiments are conducted, and the SAO method and PCNNs model are taken as respective baselines on the framework and module levels. Experimental results show that our framework out-performs the traditional one in terms of automation and accuracy, and is capable of extracting fine-grained structured information from patent texts. Keywords Patent analysis · Entity identification · Relation extraction · Deep learning · BiGRU-HAN · BiLSTM-CRF · Thin film head · SAO · PCNNs * Shuo Xu [email protected] Liang Chen [email protected] Lijun Zhu [email protected] Jing Zhang [email protected] Xiaoping Lei [email protected] Guancan Yang [email protected] 1

Institute of Scientific and Technical Information of China, Beijing 100038, People’s Republic of China

2

Research Base of Beijing Modern Manufacturing Development, College of Economics and Management, Beijing University of Technology, Beijing 100124, People’s Republic of China

3

School of Information Resource Management, Renmin University of China, Beijing 100872, People’s Republic of China

13

Vol.:(0123456789)

Scientometrics

Introduction Patent document is a type of important intellectual resource, from which valuable technical intelligence can be obtained for technology opportunity discovery (Lee and Lee 2019), invention protection (Park et al. 2012), technology trend analysis (Han et al. 2017) and so on. As a matter of fact, so far technical intelligence is mainly obtained by expert reading (Yang 2012; Zhang 2016), which is laborious and inefficient, especially when the number of patents has been increasing dramatically due to rapid development in various technology areas in recent years. The automatic reading comprehension (Chen 2018) on the patents for technical intelligence becomes a significant challenge for the entire patent system. Information extraction, armed with some powerful machine learning method, is one of the fundamental building blocks for computers to understand natural language, since it is capable of solving the ambiguous problem inherent in free texts by converting texts into semanti

Data Loading...

A deep learning based method for extracting semantic information from patent documents

Recommend Documents

Patent Analysis Based on Information in XML Documents

Deep Learning for Extracting Dispersion Curves

Patent Information Extraction from XMLs

A Method for Windows Malware Detection Based on Deep Learning

Automatic Information Extraction from Scanned Documents

Extracting information from European analyst forecasts

A Computer Virus Detection Method Based on Information from PE Structure of Files Combined with Deep Learning Models

Learning Structure and Schemas from Documents

Extracting medication information from unstructured public health data: a demonstration on data from population-based an

Game Theory Based Patent Infringement Detection Method

English speech sound improvement system based on deep learning from signal processing to semantic recognition

Efficient Deep Learning Approach for Multi-label Semantic Scene Classification