Hashtag recommendation for short social media texts using word-embeddings and external knowledge

  • PDF / 1,931,595 Bytes
  • 24 Pages / 439.37 x 666.142 pts Page_size
  • 4 Downloads / 203 Views

DOWNLOAD

REPORT


Hashtag recommendation for short social media texts using word-embeddings and external knowledge Nagendra Kumar1

· Eshwanth Baskaran2 · Anand Konjengbam2 · Manish Singh2

Received: 7 December 2018 / Revised: 25 September 2020 / Accepted: 27 September 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract With the rapid growth of Twitter in recent years, there has been a tremendous increase in the number of tweets generated by users. Twitter allows users to make use of hashtags to facilitate effective categorization and retrieval of tweets. Despite the usefulness of hashtags, a major fraction of tweets do not contain hashtags. Several methods have been proposed to recommend hashtags based on lexical and topical features of tweets. However, semantic features and data sparsity in tweet representation have rarely been addressed by existing methods. In this paper, we propose a novel method for hashtag recommendation that resolves the data sparseness problem by exploiting the most relevant tweet information from external knowledge sources. In addition to lexical features and topical features, the proposed method incorporates the semantic features based on word-embeddings and user influence feature based on users’ influential position. To gain the advantage of various hashtag recommendation methods based on different features, our proposed method aggregates these methods using learning-to-rank and generates top-ranked hashtags. Experimental results show that the proposed method significantly outperforms the current state-of-the-art methods. Keywords Hashtag recommendation · Social media analysis · Information extraction and filtering · Semantic knowledge bases

B

Nagendra Kumar [email protected] Eshwanth Baskaran [email protected] Anand Konjengbam [email protected] Manish Singh [email protected]

1

Indian Institute of Technology Indore, Indore 453552, India

2

Indian Institute of Technology Hyderabad, Hyderabad 502285, India

123

N. Kumar et al.

1 Introduction Over the past few years, hashtags have been widely used in social media to provide the topical information of user-generated content. Hashtags are shown to be useful in many applications including event detection [1], information diffusion [5], sentiment analysis [6], information retrieval [9], text classification [42], and so on. However, hashtags are manually created, and many social media texts do not contain hashtags due to users’ uncertainty and unwillingness to use hashtags. We therefore take up the task of automatically recommending the hashtags to social media texts. In this paper, we use publicly accessible tweets from Twitter to create our dataset. Twitter is one of the biggest social networking platforms with millions of active users. Users share information with their friends and followers in the form of tweets. Tweets are short texts with a maximum length of 280 characters. Due to the length constraint, tweets are usually broadcasted with limited context. Hashtags provide a better representation of tweets and