A Fuzzy Methodology for Clustering Text Documents with Uncertain Spatial References

Fuzzy ERC (Extraction, Resolving and Clustering) architecture is proposed for handling the uncertain information that can be either queried explicitly by the user and the system can also cluster the documents based on the spatial keyword present in them.

  • PDF / 394,504 Bytes
  • 13 Pages / 439.37 x 666.142 pts Page_size
  • 104 Downloads / 195 Views

DOWNLOAD

REPORT


Abstract Fuzzy ERC (Extraction, Resolving and Clustering) architecture is proposed for handling the uncertain information that can be either queried explicitly by the user and the system can also cluster the documents based on the spatial keyword present in them. This research work applies fuzzy logic techniques along with information retrieval methods in resolving the spatial uncertainty in text and also finds the spatial similarity between two documents, in other words, the degree to which two or more documents talk about the same spatial location. An experimental analysis is performed with Reuter’s Data set. The results obtained from the experiment are based on the empirical evidence of the document clustering based on the spatial references present in them. It is concluded that the proposed work will provide users a new way in retrieving documents that have similar spatial references in them. Keywords Information retrieval Fuzzy logic

 Text clustering  Uncertain spatial reference 

1 Introduction The information and knowledge sharing era is exploding with information that people are continuously sharing over various sources across the globe. All this information is mostly presented transferred and shared using natural language, since it provides flexibility and spontaneity to the users. Along with the spontaneity V.R. Kanagavalli (&) Department of Computer Sciences and Applications, Faculty of Science & Humanities, Sathyabama University, Chennai, India e-mail: [email protected] V.R. Kanagavalli Department of Computer Applications, Sri Sai Ram Engineering College, Chennai, India K. Raja Alpha College of Engineering, Chennai, India e-mail: [email protected] © Springer Science+Business Media Singapore 2016 M. Senthilkumar et al. (eds.), Computational Intelligence, Cyber Security and Computational Models, Advances in Intelligent Systems and Computing 412, DOI 10.1007/978-981-10-0251-9_13

121

122

V.R. Kanagavalli and K. Raja

comes the issue of vagueness and ambiguity. There are document classification systems that classifies and groups the documents that are speaking about the same concept. But the same type of classification is not successfully handled if it happens to be based on spatial keywords. This is due to the inherent ambiguity and uncertainty that is associated with the spatial terms found in natural language descriptions. Most of the text documents contain spatial references in them and the user’s queries are often associated with a spatial location. At present it is very difficult to retrieve documents that discuss about the same geographic location using different terms. The source of the difficulty is the level of uncertainty and fuzziness associated with natural language. Thus this research work proposes algorithms for clustering the text documents based on the crisp and uncertain spatial references present in the text document.

2 Related Work 2.1

Uncertain Spatial Referencing in Natural Language

Natural language is prone to ambiguity and there is a vast amount of literature hand