Coalition game based feature selection for text non-text separation in handwritten documents using LBP based features
- PDF / 1,656,603 Bytes
- 21 Pages / 439.37 x 666.142 pts Page_size
- 94 Downloads / 214 Views
Coalition game based feature selection for text non-text separation in handwritten documents using LBP based features Manosij Ghosh 1 & Kushal Kanti Ghosh 1 & Showmik Bhowmik 2
& Ram Sarkar
1
Received: 11 December 2019 / Revised: 23 July 2020 / Accepted: 9 September 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract
Text non-text classification is an important research problem in the domain of document image processing. Undesirably, this is an almost ignored research topic, particularly, when we consider the unconstrained offline handwritten document images. For text nontext classification, many times researchers employ high dimensional feature vectors, which not only increase the computation time and storage requirement, but also reduce the classification accuracy due to the presence of redundant or irrelevant features. Here lies the application of some feature selection (FS) algorithms in order to find out the relevant subset of the features from the original feature vector. In this paper, our aim is two-fold. Firstly, application of coalition game based FS technique to find out an optimal feature subset for classifying the components present in a handwritten document image either as text or non-text. Secondly, five variants of a popular texture based feature descriptor, called Local Binary Pattern (LBP), along with its basic version are fed to the FS module for identifying the useful patterns only which can pinpoint the regions of an image as most informative in terms of the said classification task. To the best of our knowledge, the approach is completely novel where coalition game based FS technique is applied for locating the feature-rich regions to be used for text non-text classification. For experimentation, we have prepared an in-house dataset along with its ground truth information which consists of 104 handwritten engineering class notes as well as laboratory copies that include handwritten and printed texts, graphical components and tables etc. Experimental outcomes confirm that the proposed approach not only helps in reducing the feature dimension significantly but also increases the recognition ability of all six feature vectors. Keywords Coalitiongame . Featureselection . Text non-textclassification . LBP . Texture feature . Handwritten document
* Showmik Bhowmik [email protected] Extended author information available on the last page of the article
Multimedia Tools and Applications
1 Introduction Texture features are heavily relied on by the researchers over the years for solving various classification problems related to document image processing [5]. These include region classification in printed documents [28] [34], text non-text separation in scene text images [12, 21] and many others. Text non-text separation in unconstrained offline handwritten documents is a very important area of research in the domain of document image processing [5]. In our daily life, we come across many handwritten documents in the form of class notes, handwritten reports and
Data Loading...