A framework for crime data analysis using relationship among named entities

PDF / 2,090,474 Bytes
19 Pages / 595.276 x 790.866 pts Page_size
98 Downloads / 247 Views

(0123456789().,-volV)(0123456789(). ,- volV)

S O F T C O M P U T I N G T E C H N I Q U E S : A P P L I CA T I O N S A N D C H A L L E N G E S

A framework for crime data analysis using relationship among named entities Priyanka Das1

•

Asit Kumar Das1 • Janmenjoy Nayak2 • Danilo Pelusi3

Received: 12 November 2018 / Accepted: 12 March 2019 Ó Springer-Verlag London Ltd., part of Springer Nature 2019

Abstract Many crime reports are available online in various blogs and Newswire. Though manual annotation of these massive reports is quite tedious for crime data analysis, it gives an overall crime scenario of all over the world. This motivates us to propose a framework for crime data analysis based on the online reports. Initially, the method extracts the crime reports and identifies named entities. The intermediate sequence of context words between every consecutive pair of named entities is termed as a crime vector that provides relationships between the entities. The feature vectors for each entity pair are generated from these crime vectors using the Word2Vec model. The paper considers three different types of named entity pairs to facilitate the major crime data analysis task, and for each type, similarity between every pair of entities is measured using respective feature vectors. For each type of named entity pair, a separate weighted graph is generated with entity pairs as vertices and similarity score between them as the weight of the corresponding edge. Then, Infomap, a graph-based clustering algorithm, is applied to obtain optimal set of clusters of entity pairs and a representative entity pair of each cluster. Each cluster is labelled by the relationship, represented by the crime vector, of its representative entity pair. In reality, all the entity pairs in a cluster may not reflect contextual similarity with their representative entity pair. So the clusters are further partitioned into subclusters based on WordNet-based path similarity measure which makes the entity pairs in each subcluster more contextually similar compared to their original cluster. These subclusters provide us various statistical crime information over the time period. The method is experimented only using the crime reports related to crime against women in India. The experimental results demonstrate the effectiveness and superiority of the method compared to others for analysing the crime data. Keywords Crime analysis Online news Entity recognition Relation extraction Paraphrase extraction Graph-based clustering

& Priyanka Das [email protected] Asit Kumar Das [email protected] Janmenjoy Nayak [email protected] Danilo Pelusi [email protected] 1

Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, India

2

Department of Computer Science and Engineering, Sri Sivani College of Engineering, Chilakapalem, Andhra Pradesh, India

3

Department of Communications Sciences, University of Teramo, Teramo, Italy

1 Introduction Interne

Data Loading...

A framework for crime data analysis using relationship among named entities

Recommend Documents

Recognizing Named Entities in Specific Domain

Automatic extraction of named entities of cyber threats using a deep Bi-LSTM-CRF network

A Neural Framework for Chinese Medical Named Entity Recognition

Named Data Networking

Public Opinion Monitoring for Proactive Crime Detection Using Named Entity Recognition

Semantic Data Integration for Life Science Entities

Application of Data Mining for Analysis and Prediction of Crime

Analysis of spatiotemporal data relationship using information granules

Modeling Your Entities and Data with JPA

SciNER: A Novel Scientific Named Entity Recognizing Framework

Knowledge Discovery with CRF-Based Clustering of Named Entities without a Priori Classes

A Framework for Managing VLSI CAD Data