Multilingual Protest Event Data Collection with GATE

Protest event databases are key sources that sociologists need to study the collective action dynamics and properties. This paper describes a finite-state approach to protest event features collection from short texts (news lead sentences) in several Euro

  • PDF / 1,195,648 Bytes
  • 12 Pages / 439.37 x 666.142 pts Page_size
  • 19 Downloads / 193 Views

DOWNLOAD

REPORT


Russian Presidential Academy (RANEPA), Moscow, Russia [email protected], [email protected] 2 Autonomous University of Barcelona, Barcelona, Spain 3 ITMO University, St. Petersburg, Russia [email protected] 4 Saint Petersburg State University, St. Petersburg, Russia

Abstract. Protest event databases are key sources that sociologists need to study the collective action dynamics and properties. This paper describes a finite-state approach to protest event features collection from short texts (news lead sentences) in several European languages (Bulgarian, French, Polish, Russian, Spanish, Swedish) using the General Architecture for Text Engineering (GATE). The results of the annotation performance evaluation are presented.

Keywords: Protest event data extraction · GATE

1

·

Event features

·

Information

Introduction

Protest activity reflects peoples satisfaction, confidence and belief in leader. It has been a hot issue in the recent years. “Presidents, prime ministers and assorted rulers, consider that you have been warned: A massive protest can start at any time, seemingly over any issue, and can grow to a size and intensity no one expected. Your countrys image, your own prestige, could risk unraveling as you face the wrath of the people” - wrote Frida Ghitis, special to CNN, in June 2013 referring to the dramatic events that took place in Turkey and Brazil. November of the same year marked the beginning of the civil unrest in Ukraine, which lead to the coup detat and grew into an international conflict and a civil war. Research related to the protest phenomenon, contentious collective action more broadly, is of prime interest to social scientists and governmental workers. For decades sociologists from the institutions, such as Berkman Center for Internet and Society, University of Illinois at Urbana-Champaign Cline Center for Democracy and others have been studying regularities in contentious collective action and accumulating statistics for protest prediction and the analysis of its origins, dynamics and aftermath from protest event data (single events, small event sets and, since recently, big data). Under partial support of the Government of the Russian Federation Grant 074-U01. c Springer International Publishing Switzerland 2016  E. M´ etais et al. (Eds.): NLDB 2016, LNCS 9612, pp. 115–126, 2016. DOI: 10.1007/978-3-319-41754-7 10

116

V. Danilova et al.

Earlier protest studies have been based on the manual analysis of newspapers data. Since 90’s, automatic approaches have been applied to the protest database population and coding that mostly apply natural language processing techniques. News media remains the most used source for protest event data collection. Its advantages are accessibility and good temporal coverage, while the biased view of events is known to be its main pitfall. Since recently, social media data is being used to study the connections between real and virtual protest activity, between protest-related news reports often controlled by the government and social media dis