Integrating Machine Learning Techniques in Semantic Fake News Detection

  • PDF / 804,952 Bytes
  • 18 Pages / 439.37 x 666.142 pts Page_size
  • 92 Downloads / 279 Views

DOWNLOAD

REPORT


Integrating Machine Learning Techniques in Semantic Fake News Detection Adrian M. P. Bra¸soveanu1,3

· Razvan ˘ Andonie2,3

Accepted: 3 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract The nuances of languages, as well as the varying degrees of truth observed in news items, make fake news detection a difficult problem to solve. A news item is never launched without a purpose, therefore in order to understand its motivation it is best to analyze the relations between the speaker and its subject, as well as different credibility metrics. Inferring details about the various actors involved in a news item is a problem that requires a hybrid approach that mixes machine learning, semantics and natural language processing. This article discusses a semantic fake news detection method built around relational features like sentiment, entities or facts extracted directly from text. Our experiments are focused on short texts with different degrees of truth and show that adding semantic features improves accuracy significantly. Keywords NLP · Semantics · Relation extraction · Deep learning

1 Introduction Detecting fake news is an interdisciplinary problem, as it requires us to examine which methods where used to disseminate the news (e.g., social networks [53]), the links between the various actors involved (e.g., by using the information available in public Knowledge Graphs like Wikipedia), the propaganda tools (e.g., language can often be examined through the lens of semantics [6]) or even the geopolitics (e.g., as proven by the Camdridge Analytica scandal, some news might be targeted to some specific groups who might be more likely to respond to it). At a superficial level it is important to distinguish between satire and political weapons (or any other kind of weapons built on top of deceptive news) [8] or between the various news outlets that spread it, but when analyzing a news item it often helps to deploy a

B

Adrian M. P. Bra¸soveanu [email protected] R˘azvan Andonie [email protected]

1

MODUL Technology GmbH, Vienna, Austria

2

Computer Science Department, Central Washington University, Ellensburg, WA, USA

3

Electronics and Computers Department, Transilvania University of Bra¸sov, Bra¸sov, Romania

123

A. M. P. Bra¸soveanu, R. Andonie

varied Natural Language Processing (NLP) arsenal that includes sentiment analysis, Named Entity Recognition Linking and Classification (NERLC [26]), n-grams, topic detection, partof-speech (POS) taggers, query expansion or relation extraction [65]. NLP tools are often supported by large Knowledge Graphs (KGs) like DBpedia [35] which collects data about entities and concepts extracted from Wikipedia. The extracted named entities and relations will be linked to such KGs whenever possible, whereas various sentiment aspects, polarity or subjectivity might be computed according to the detected entities. Features like sentiment, named entities or relations render a set of shallow meaning representations, and are typically called sem