A framework for crime data analysis using relationship among named entities
- PDF / 2,090,474 Bytes
- 19 Pages / 595.276 x 790.866 pts Page_size
- 98 Downloads / 166 Views
(0123456789().,-volV)(0123456789(). ,- volV)
S O F T C O M P U T I N G T E C H N I Q U E S : A P P L I CA T I O N S A N D C H A L L E N G E S
A framework for crime data analysis using relationship among named entities Priyanka Das1
•
Asit Kumar Das1 • Janmenjoy Nayak2 • Danilo Pelusi3
Received: 12 November 2018 / Accepted: 12 March 2019 Ó Springer-Verlag London Ltd., part of Springer Nature 2019
Abstract Many crime reports are available online in various blogs and Newswire. Though manual annotation of these massive reports is quite tedious for crime data analysis, it gives an overall crime scenario of all over the world. This motivates us to propose a framework for crime data analysis based on the online reports. Initially, the method extracts the crime reports and identifies named entities. The intermediate sequence of context words between every consecutive pair of named entities is termed as a crime vector that provides relationships between the entities. The feature vectors for each entity pair are generated from these crime vectors using the Word2Vec model. The paper considers three different types of named entity pairs to facilitate the major crime data analysis task, and for each type, similarity between every pair of entities is measured using respective feature vectors. For each type of named entity pair, a separate weighted graph is generated with entity pairs as vertices and similarity score between them as the weight of the corresponding edge. Then, Infomap, a graph-based clustering algorithm, is applied to obtain optimal set of clusters of entity pairs and a representative entity pair of each cluster. Each cluster is labelled by the relationship, represented by the crime vector, of its representative entity pair. In reality, all the entity pairs in a cluster may not reflect contextual similarity with their representative entity pair. So the clusters are further partitioned into subclusters based on WordNet-based path similarity measure which makes the entity pairs in each subcluster more contextually similar compared to their original cluster. These subclusters provide us various statistical crime information over the time period. The method is experimented only using the crime reports related to crime against women in India. The experimental results demonstrate the effectiveness and superiority of the method compared to others for analysing the crime data. Keywords Crime analysis Online news Entity recognition Relation extraction Paraphrase extraction Graph-based clustering
& Priyanka Das [email protected] Asit Kumar Das [email protected] Janmenjoy Nayak [email protected] Danilo Pelusi [email protected] 1
Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, India
2
Department of Computer Science and Engineering, Sri Sivani College of Engineering, Chilakapalem, Andhra Pradesh, India
3
Department of Communications Sciences, University of Teramo, Teramo, Italy
1 Introduction Interne
Data Loading...