CyberBERT: BERT for cyberbullying identification

  • PDF / 1,142,292 Bytes
  • 8 Pages / 595.276 x 790.866 pts Page_size
  • 54 Downloads / 501 Views

DOWNLOAD

REPORT


SPECIAL ISSUE PAPER

CyberBERT: BERT for cyberbullying identification BERT for cyberbullying identification Sayanta Paul1 · Sriparna Saha1 Received: 9 July 2020 / Accepted: 23 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Cyberbullying can be delineated as a purposive and recurrent act, which is aggressive in nature, done via different social media platforms such as Facebook, Twitter, Instagram, and others. A state-of-the-art pre-training language model, BERT (Bidirectional Encoder Representations from Transformers), has achieved remarkable results in many language understanding tasks. In this paper, we present a novel application of BERT for cyberbullying identification. A straightforward classification model using BERT is able to achieve the state-of-the-art results across three real-world corpora: Formspring ( ∼ 12k posts), Twitter ( ∼ 16k posts), and Wikipedia ( ∼ 100k posts). Experimental results demonstrate that our proposed model achieves significant improvements over existing works, in comparison with the slot-gated or attention-based deep neural network models. Keywords  Cyberbullying · Language model · Deep learning · BERT

1 Introduction Online social media platforms allow people to share and express their thoughts and feelings freely and publicly with others. This can appear as an assortment of tech-empowered exercises, e.g., photo sharing, blogging, social gaming, social video sharing, business networks, comments & reviews, and many others. The information available over these social media is a rich resource for sentiment analysis or inferring other increasing uses and abuses. This increasing growth of social networking introduces continuous harassment and stalking which is commonly referred to as cyberbullying [1]. Broadly cyberbullying can come up of different forms such as racism (e.g., facial features, skin colour), sexism (e.g., male, female), physical appearance (e.g., ugly, fat), intelligence (e.g., ass, stupid), and so on. Sometimes, this act of cyberbullying is anonymous1, i.e., quite hard to trace, which has intense and devastating effects. Therefore, detecting cyberbullying at its initial stage is a crucial step to prevent this act and also to avoid any fatal incidents caused * Sriparna Saha [email protected] 1



Department of Computer Science and Engineering, Indian Institute of Technology Patna, Bihta, Bihar, India

by it. In recent years, researchers have focused on developing different machine learning and deep learning-based methods for solving the cyberbullying problem. Classifying texts into specific categories is an ideal problem in Natural Language Processing (NLP). The important intermediate steps involve neural architecture design and data representation using word embeddings. This deep language representation has always been a crucial factor for efficient text categorization. Bidirectional Encoder Representations from Transformers (BERT) [2] was proposed in recent years, and it was successfully used in developing several state-of-the-