A Generalized Framework for Quantifying Trust of Social Media Text Documents

Social media has become a very popular place for users seeking knowledge about a wide variety of topics. While it contains many helpful documents, it also contains many useless and malicious documents or spams. For a casual observer it is very hard to ide

  • PDF / 1,643,053 Bytes
  • 22 Pages / 439.37 x 666.142 pts Page_size
  • 27 Downloads / 152 Views

DOWNLOAD

REPORT


Abstract. Social media has become a very popular place for users seeking knowledge about a wide variety of topics. While it contains many helpful documents, it also contains many useless and malicious documents or spams. For a casual observer it is very hard to identify high quality or trustworthy documents. As the volume of such data increases, the task for identifying the trustworthy documents becomes more and more difficult. A huge number of research works have focused on quantifying trust in certain specific social network domains. Some have quantified trust based on social graph. In this work, we use such social graph named Reduced node Social Graph with Relationships (RSGR) and we develop a three-step syntax and semantic based trust mining framework. Here we generalize the concept of trust mining for all structured as well as unstructured unsupervised text documents from all social network domains. We calculate trust based on metadata, trust based on relationships with other documents and finally we propagate the trust calculated so far along various relationship edges to calculate the final trust. Finally we show that our method calculates the trust of social media text documents with more than 80% accuracy.

1

Introduction

Social Media is growing day by day at an increasing rate of growth of millions of documents per day. A document could be a facebook post, a tweet, a blog, a review or even a video. A substantial portion of these documents is useful and is an excellent source for performing various social media analysis like sentiment analysis, segmentation analysis, etc. to obtain knowledge. But due to the availability of this huge amount of information, there is an important need for differentiating between good and bad documents, since all these documents are not useful unless manually read. In the current scenario of social media analysis, documents are fed individually [25][22]. But, as there is no relationship between the documents, we have knowledge of every document individually but not in presence of other documents. So, we cannot say anything about the reliability of the individual documents. Let us call the reliability as Trust. Hence the trust of the data is unknown and there is absolutely no quick method to diagnose if the document could be actually trusted. The need for Trust Mining comes from c Springer International Publishing Switzerland 2016  P. Perner (Ed.): MLDM 2016, LNAI 9729, pp. 692–713, 2016. DOI: 10.1007/978-3-319-41920-6 53

A Generalized Framework for Quantifying Trust

693

the fact that the better understanding of the data we have, the better will be the analysis of that data. So, before feeding the documents into social media analysis, some trust distribution should be assigned to the dataset so that low trustworthy documents can be filtered out for the improvement of the analysis. In social media we come across various spams and as the documents are processed independent of each other, it results in lack of context, and an inability to remove noise from the incoming data. For ex