Automatic Classification of Forum Posts: A Finnish Online Health Discussion Forum Case

Online health discussion forums play a key role in accessing, distributing and exchanging health information at an individual and societal level. Due to their free nature, using and regulating these forums require substantial amount of manual effort. In t

  • PDF / 3,333,672 Bytes
  • 4 Pages / 595.276 x 790.866 pts Page_size
  • 12 Downloads / 155 Views

DOWNLOAD

REPORT


Abstract— Online health discussion forums play a key role in accessing, distributing and exchanging health information at an individual and societal level. Due to their free nature, using and regulating these forums require substantial amount of manual effort. In this study, we propose a computational approach, i.e., a machine learning framework, in order to categorize the messages from Finland’s largest online health discussion forum into 16 categories. An accuracy of 70.8% was obtained with a Na¨ıve Bayes classifier, applied on term frequency-inverse document frequency features. Keywords— machine learning, natural language processing, online discussion forum, social media, topic classification.

and correction by the forum administration. In this context, employing a machine learning based topic classification system can improve the quality of the online health discussion forum by assisting both users and administrators. In [4], posts from a smoking cessation forum are classified using a Na¨ıve Bayes (NB) classifier. Similarly, a binary NB classifier is trained with bag-of-words features in [5] in order to classify the questions in WebMD diabetes community as important or not. In [6] a rule-based classification framework is proposed to categorize users intent of posting contents into 4 categories. In [7], handcrafted text features are extracted from online cancer survivor community posts and several machine learning algorithms are applied on the data, resulting in up to 79.2% accuracy in classifying the sentiment.

I. I NTRODUCTION Social media is one of the significant aspects of the current e-health ecosystem. Online health information seekers use the Internet and social media for several reasons, e.g., researching what other consumers say about medication or treatment, researching other consumers’ knowledge and experience, learning skills and gaining knowledge to manage a condition, getting emotional support, building awareness, and sharing knowledge [1]. Common platforms used by online health information consumers include blogs, wikis, social networks, live chat rooms, video-sharing websites, podcasts, online forums and message boards [2]. Social media use in healthcare is shown to have effects on patients such as enhanced psychological well-being and improved selfmanagement and control [3]. On the other hand, addiction to social media, loss of privacy, and being targeted for promotion are also shown to be part of possible effects [3]. Online health discussion forums, while being prominent in online health communication, require governing and regulation in order to be efficient and successful due to the large amounts of unstructured information. Many online discussion forums have categorical separation of discussion topics as well as subtopics, in order to provide orderly means of communication to their users. Therefore, relevant categorization of a new message posted by a user has to either rely on user’s judgment of the appropriate category or manual assignment © Springer Nature Singapore Pte Ltd. 2018 H. E