A Fuzzy Methodology for Clustering Text Documents with Uncertain Spatial References

Fuzzy ERC (Extraction, Resolving and Clustering) architecture is proposed for handling the uncertain information that can be either queried explicitly by the user and the system can also cluster the documents based on the spatial keyword present in them.

PDF / 394,504 Bytes
13 Pages / 439.37 x 666.142 pts Page_size
104 Downloads / 307 Views

DOWNLOAD

REPORT

Abstract Fuzzy ERC (Extraction, Resolving and Clustering) architecture is proposed for handling the uncertain information that can be either queried explicitly by the user and the system can also cluster the documents based on the spatial keyword present in them. This research work applies fuzzy logic techniques along with information retrieval methods in resolving the spatial uncertainty in text and also ﬁnds the spatial similarity between two documents, in other words, the degree to which two or more documents talk about the same spatial location. An experimental analysis is performed with Reuter’s Data set. The results obtained from the experiment are based on the empirical evidence of the document clustering based on the spatial references present in them. It is concluded that the proposed work will provide users a new way in retrieving documents that have similar spatial references in them. Keywords Information retrieval Fuzzy logic

Text clustering Uncertain spatial reference

1 Introduction The information and knowledge sharing era is exploding with information that people are continuously sharing over various sources across the globe. All this information is mostly presented transferred and shared using natural language, since it provides flexibility and spontaneity to the users. Along with the spontaneity V.R. Kanagavalli (&) Department of Computer Sciences and Applications, Faculty of Science & Humanities, Sathyabama University, Chennai, India e-mail: [email protected] V.R. Kanagavalli Department of Computer Applications, Sri Sai Ram Engineering College, Chennai, India K. Raja Alpha College of Engineering, Chennai, India e-mail: [email protected] © Springer Science+Business Media Singapore 2016 M. Senthilkumar et al. (eds.), Computational Intelligence, Cyber Security and Computational Models, Advances in Intelligent Systems and Computing 412, DOI 10.1007/978-981-10-0251-9_13

121

122

V.R. Kanagavalli and K. Raja

comes the issue of vagueness and ambiguity. There are document classiﬁcation systems that classiﬁes and groups the documents that are speaking about the same concept. But the same type of classiﬁcation is not successfully handled if it happens to be based on spatial keywords. This is due to the inherent ambiguity and uncertainty that is associated with the spatial terms found in natural language descriptions. Most of the text documents contain spatial references in them and the user’s queries are often associated with a spatial location. At present it is very difﬁcult to retrieve documents that discuss about the same geographic location using different terms. The source of the difﬁculty is the level of uncertainty and fuzziness associated with natural language. Thus this research work proposes algorithms for clustering the text documents based on the crisp and uncertain spatial references present in the text document.

2 Related Work 2.1

Uncertain Spatial Referencing in Natural Language

Natural language is prone to ambiguity and there is a vast amount of literature hand

Data Loading...

A Fuzzy Methodology for Clustering Text Documents with Uncertain Spatial References

Recommend Documents

Kernel Fuzzy C Means Clustering with New Spatial Constraints

On Text Tiling for Documents: A Neural-Network Approach

Text Clustering

A Fuzzy Clustering Approach for TS Fuzzy Model Identification

Scene Text Detection with Adaptive Line Clustering

Fuzzy Clustering

A Fuzzy C-Means Clustering Algorithm Based on Spatial Context Model for Image Segmentation

A new integrated on-line fuzzy clustering and segmentation methodology with adaptive PCA approach for process monitoring

On Integrating and Classifying Legal Text Documents

A Novel Methodology for Converting English Text into Objects

Comparison of Text Classification Methods for Government Documents

A New Evolving Tree for Text Document Clustering and Visualization