A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset

PDF / 1,551,947 Bytes
22 Pages / 439.37 x 666.142 pts Page_size
70 Downloads / 312 Views

A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset Cinthia M. Souza1 · Magali R. G. Meireles1 · Paulo E. M. Almeida2 Received: 23 December 2019 © Akadémiai Kiadó, Budapest, Hungary 2020

Abstract Patents are an important source of information for measuring the technological advancement of a specific knowledge domain. To facilitate the search for information in patent datasets, classification systems separate documents into groups according to the area of knowledge, and designate names to define their content. The increase in the number of patented inventions leads to the need to subdivide these groups. Since these groups belong to a restricted knowledge domain, naming the generated subcategories can be extremely laborious. This work aims to compare the performance of abstractive and extractive summarization techniques in the task of generating sentences directly associated with the content of patents. The abstractive summarization model was composed by a Seq2Seq architecture and a LSTM network. The training was conducted with a dataset of patent titles and abstracts. The validation process was performed using the ROUGE set of metrics. The results obtained by the generated model were compared with the sentence resulting from an extractive summarization algorithm applied to the task of naming patent groups. The main idea was to help the specialist to name new patent groups created by the clustering systems. The naming experiments were performed on the dataset of abstracts of patent documents. Comparative experiments were conducted using four subgroups of the United States Patent and Trademark Office, which uses the Cooperative Patent Classification system. Keywords Computational intelligence · Knowledge representation · Information systems · Automatic text summarization · Patent datasets

* Magali R. G. Meireles [email protected] 1

Pontifical Catholic University of Minas Gerais, Belo Horizonte, MG, Brazil

2

Federal Center for Technological Education of Minas Gerais, Belo Horizonte, MG, Brazil

13

Vol.:(0123456789)

Scientometrics

Introduction Patents are an important knowledge source and, therefore, their analysis has been considered a useful tool for research and for management development. Patents are one of the most effective ways to protect an invention today (Wang et al. 2019). One of the objectives of granting patents is to facilitate the dissemination of scientific knowledge (Ouellette 2017). However, finding information in these documents is becoming an increasingly complex task due to the large number of patents in datasets (Sjögren et al. 2018). These documents have a complex language with excessive descriptive technical details and idiosyncrasies that report to the structure of the patent document and the length of the sentences. Thereafter, the retrieval process and analysis of these documents are time consuming and laborious (Codina-Filbà et al. 2017; Gomez 2019). The efficient analysis of these documents allows for monitori

Data Loading...

A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset

Recommend Documents

From Extractive to Abstractive Summarization: A Journey

ViMs: a high-quality Vietnamese dataset for abstractive multi-document summarization

A Discussion on Various Methods in Automatic Abstractive Text Summarization

An Abstractive Summarization Method Based on Global Gated Dual Encoder

A Comparative Study of U.S. and Japanese Patent Systems

Bengali Abstractive News Summarization (BANS): A Neural Attention Approach

Abstractive Summarization via Discourse Relation and Graph Convolutional Networks

A Discourse-Informed Approach for Cost-Effective Extractive Summarization

Dataset for Automatic Summarization of Russian News

Leverage Unlabeled Data for Abstractive Speech Summarization with Self-supervised Learning and Back-Summarization

Learning Interactions at Multiple Levels for Abstractive Multi-document Summarization

An HCI Approach to Extractive Text Summarization: Selecting Key Sentences Based on User Copy Operations