A novel framework for generating handwritten datasets

  • PDF / 3,456,005 Bytes
  • 13 Pages / 439.642 x 666.49 pts Page_size
  • 57 Downloads / 231 Views

DOWNLOAD

REPORT


A novel framework for generating handwritten datasets Sajid Anwar1

· Bilal Mehrban2 · Musawar Ali1 · Farhan Hussain3 · Zahid Halim1

Received: 26 November 2019 / Revised: 22 July 2020 / Accepted: 4 August 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract The performance of deep learning algorithms is highly dependent on the size and diversity of data. However, for handwritten character recognition, dataset creation, segmentation, and labeling are time consuming and laborious tasks and not much researched. This work proposes a novel and generic framework which automates the segmentation and labeling processes for handwritten datasets. First, a user collects handwritten glyphs on the proposed form. Next, based on a priori knowledge, local peaks from horizontal and vertical projection functions are computed. This helps in locating and segmenting individual samples automatically. To show the effectiveness of the proposed framework, a dataset of 160,000 samples is collected for an oriental language. We profile the segmentation of samples from one sheet with three approaches: manual, semi-automatic, and the proposed fully automatic approach. Compared to the manual and semi-automatic processes, the proposed approach is 120× and 65× faster, respectively. Further, we also present the classification of this dataset by traditional and state-of-the-art machine learning algorithms.

Sajid Anwar deceased.  Bilal Mehrban

[email protected] Sajid Anwar [email protected] Musawar Ali [email protected] Farhan Hussain [email protected] Zahid Halim [email protected] 1

Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Swabi, Pakistan

2

University of AJK, Muzaffarabad, Pakistan

3

National University of Sciences and Technology, Islamabad, Pakistan

Multimedia Tools and Applications

Keywords Automatic segmentation · Handwritten character recognition · Deep learning algorithms

1 Introduction Alphanumeric and punctuation characters are the basic units that constitute a natural language. With optical character recognition, text interpretation, text to speech conversion, translation and transliteration can be conducted by a machine. It is therefore important to develop algorithms which can classify the individual components of the language. Handwritten digit recognition is difficult due to large scale variations in font, stroke size and writing styles. Classifying digits and alphabets can be learnt from data with various machine learning algorithms. Researches have achieved very high classification accuracies of alphanumeric characters by using these algorithms. A few of these techniques include the linear classifiers [21], non-linear classifiers [20, 25], k-nearest neighbors algorithms [12], support vector machines (SVM’s) [10], multilayer perceptron (MLP) classifiers and convolutional neural networks (CNN’s) [7, 8, 29]. Particularly, CNN’s are the state-ofthe-art for classifying handwritten characters. Coming to the second aspect of handwritten alph