Diverse feature set based Keyphrase extraction and indexing techniques

  • PDF / 1,507,115 Bytes
  • 32 Pages / 439.37 x 666.142 pts Page_size
  • 36 Downloads / 209 Views

DOWNLOAD

REPORT


Diverse feature set based Keyphrase extraction and indexing techniques Saurabh Sharma 1 & Vishal Gupta 1 & Mamta Juneja 1 Received: 16 October 2019 / Revised: 5 June 2020 / Accepted: 21 July 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

The internet changed the way that people communicate, and this has led to a vast amount of Text that is available in electronic format. It includes things like e-mail, technical and scientific reports, tweets, physician notes and military field reports. Providing key-phrases for these extensive text collections thus allows users to grab the essence of the lengthy contents quickly and helps to locate information with high efficiency. While designing a Keyword Extraction and Indexing system, it is essential to pick unique properties, called features. In this article, we proposed different unsupervised keyword extraction approaches, which is independent of the structure, size and domain of the documents. The proposed method relies on the novel and cognitive inspired set of standard, phrase, word embedding and external knowledge source features. The individual and selected feature results are reported through experimentation on four different datasets viz. SemEval, KDD, Inspec, and DUC. The selected (feature selection) and word embedding based features are the best features set to be used for keywords extraction and indexing among all mentioned datasets. That is the proposed distributed word vector with additional knowledge improves the results significantly over the use of individual features, combined features after feature selection and state-of-the-art. After successfully achieving the objective of developing various keyphrase extraction methods we also experimented it for document classification task. Keywords Keyphrase extraction . Word embedding . Keyphrase indexing . External knowledge . Free indexing . Natural language processing

* Vishal Gupta [email protected]

1

University Institute of Engineering & Technology, Panjab University, Chandigarh, India

Multimedia Tools and Applications

1 Introduction With the rapid growth of web and storage techniques, rich sources of text information are becoming readily available nowadays. It is well known that gleaning real-time insights from these vast amounts of text documents can be a challenging task because of the monotonically growing repositories of digital documents. As the size & complexity of repositories continue to grow exponentially [27], the need to retrieve main content from this much gigantic data has given rise to flexible key-phrase extraction and topic indexing techniques. Definition 1 (Keyphrase) Let K denote the keyword set extracted from documents, where Ki = {ki1;. ..; kij} is a set of key-phrases that best represent the document D. The most representative key-phrases should satisfy the criteria: well noticed. Natural Language Processing plays a significant role in cognitive computing systems to simulate the thought process of humans to understand and process human knowledge. Top