Multi-label classification and knowledge extraction from oncology-related content on online social networks

  • PDF / 1,320,542 Bytes
  • 38 Pages / 439.37 x 666.142 pts Page_size
  • 57 Downloads / 129 Views

DOWNLOAD

REPORT


Multi‑label classification and knowledge extraction from oncology‑related content on online social networks Mahdi Hashemi1 · Margeret Hall2

© Springer Nature B.V. 2020

Abstract This study aims at automatic processing and knowledge extraction from large amounts of oncology-related content from online social networks (OSN). In this context, a large number of OSN textual posts concerning major cancer types are automatically scraped and structured using natural language processing techniques. Machines are trained to assign multiple labels to these posts based on the type of knowledge enclosed, if any. Trained machines are used to automatically classify large-scale textual posts. Statistical inferences are made based on these predictions to extract general concepts and abstract knowledge. Different approaches for constructing document feature vectors showed no tangible effect on the classification accuracy. Among different classifiers, logistic regression achieved the highest overall accuracy (96.4%) and F1 (73.4) in a 13-way multi-label classification of textual posts. The most common topic was seeking or providing moral support for cancer patients, followed by providing technical information about cancer causes and treatments. The most common causes and treatments of different types of cancer on OSN are also automatically detected in this study. Seeking or providing moral support for cancer patients shared the largest overlap with other topics, i.e. moral support tends to be present even in OSN posts which focus on other topics. On the other hand, providing technical information about cancer diagnosis or prevention were the most isolated topics, where OSN posts tend not to allude to other topics. OSN posts which seek financial support only overlap with the moral support topic, if any. Our methodology and results provide public health professionals with an opportunity to monitor what topics and to which extent are being discussed on OSN, what specific information and knowledge are being disseminated over OSN, and to assess their veracity in close to real time. This helps them to develop policies that encourage, discourage, or modify the consumption of viral oncology-related information on OSN. Keywords  Cancer · Social networks · Natural language processing · Machine learning · Classification · Knowledge extraction

* Mahdi Hashemi [email protected] 1

Department of Information Sciences and Technology, George Mason University, 4400 University Dr, Fairfax, VA 22030, USA

2

College of Information Science and Technology, University of Nebraska at Omaha, 1110 S 67th St, Omaha, NE 68182, USA



13

Vol.:(0123456789)



M. Hashemi, M. Hall

1 Introduction Traditionally, patients have received health-related information by meeting personally with their doctors or medical staff and could only share details of their condition to their families or to someone close to them. However, multiple recent studies have shown that cancer patients are increasingly employing online social networks (OSN), not only to express their person