Developing Crowdsourced Training Data Sets for Pharmacovigilance Intelligent Automation

  • PDF / 844,570 Bytes
  • 10 Pages / 595.276 x 790.866 pts Page_size
  • 110 Downloads / 200 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH ARTICLE

Developing Crowdsourced Training Data Sets for Pharmacovigilance Intelligent Automation Alex Gartland1 · Andrew Bate2 · Jeffery L. Painter3 · Tim A. Casperson4 · Gregory Eugene Powell5 Accepted: 23 November 2020 © Springer Nature Switzerland AG 2020

Abstract Introduction  Machine learning offers an alluring solution to developing automated approaches to the increasing individual case safety report burden being placed upon pharmacovigilance. Leveraging crowdsourcing to annotate unstructured data may provide accurate, efficient, and contemporaneous training data sets in support of machine learning. Objective  The objective of this study was to evaluate whether crowdsourcing can be used to accurately and efficiently develop training data sets in support of pharmacovigilance automation. Materials and Methods  Pharmacovigilance experts created a reference dataset by reviewing 15,490 de-identified social media posts of narratives pertaining to 15 drugs and 22 medically relevant topics. A random sampling of posts from the reference dataset was published on Amazon Turk and its users (Turkers) were asked a series of questions about those same medical concepts. Accuracy, price elasticity, and time efficiency were evaluated. Results  Accuracy of crowdsourced curation exceeded 90% when compared to the reference dataset and was completed in about 5% of the time. There was an increase in time efficiency with higher pay, but there was no significant difference in accuracy. Additionally, having a social media post reviewed by more than one Turker (using a voting system) did not offer significant improvements in terms of accuracy. Conclusions  Crowdsourcing is an accurate and efficient method that can be used to develop training data sets in support of pharmacovigilance automation. More research is needed to better understand the breadth and depth of possible uses as well as strengths, limitations, and generalizability of results.

Key Points  Wider deployment of machine learning in pharmacovigilance requires further algorithmic evaluations, and appropriate contemporaneous test sets are lacking. Supplementary Information  The online version contains supplementary material available at https​://doi.org/10.1007/s4026​ 4-020-01028​-w. * Gregory Eugene Powell [email protected] 1



College of Medicine, University of Central Florida, Orlando, FL, USA

2



Safety and Medical Governance, GlaxoSmithKline, London, UK

3

JiveCast, Raleigh, NC, USA

4

North American Medical Affairs, GlaxoSmithKline, Research Triangle Park, NC, USA

5

Pharma Safety, GlaxoSmithKline, 5 Moore Dr., Research Triangle Park, NC 27709, USA



Crowdsourcing has become a frequently leveraged approach to a wide range of challenges, we present its application to the review of public domain data of potential use to safety. We evaluated the crowdsourced approach and showed it to be a scalable, rapid, and effective approach for developed annotated social media data.

Vol.:(0123456789)



1 Introduction The volume of individual case