Line and word segmentation of handwritten text document by mid-point detection and gap trailing

  • PDF / 3,065,219 Bytes
  • 16 Pages / 439.642 x 666.49 pts Page_size
  • 95 Downloads / 195 Views

DOWNLOAD

REPORT


Line and word segmentation of handwritten text document by mid-point detection and gap trailing Inunganbi Sanasam1

· Prakash Choudhary2 · Khumanthem Manglem Singh1

Received: 1 August 2019 / Revised: 17 June 2020 / Accepted: 21 July 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract This paper presents the text line and word segmentation from unconstrained handwritten documents based on horizontal projection histogram (HPH) to detect mid-points and gap trailing between lines. The midpoints are estimated from the HPH for the first 100 to 200 columns of the whole document. Then, considering the mid-points, the gap is tracked between two consecutive lines from locally computed HPH for a block having k rows and j columns. The HPH block is examined for various cases to locate optimal rows that separate adjacent lines. The proposed method segments curve, touching and skew-lines and is robust to writing variation and language independent. Word segmentation is not treated as a separate problem and goes efficiently alongside the line segmentation. As the trailing of space between neighboring lines goes on, the vertical projection Histogram (VPH) of t columns is monitored between the above and below separator of a line and find the optimal word separator. The algorithm is evaluated on two isolated datasets of different languages (Meitei Mayek and English). Text-line and word segmentation on Meitei Mayek handwritten documents achieve 91.84% and 88.96% accuracy respectively. Similarly, the handwritten English document meets 94.18% and 87.73% accuracy for line and word segmentation. Keywords Line segmentation · Word segmentation · Handwritten documents · Meitei Mayek documents · English text

 Inunganbi Sanasam

[email protected] Prakash Choudhary [email protected] Khumanthem Manglem Singh [email protected] 1

NIT Manipur, Imphal, India

2

NIT Hamirpur, Imphal, HP, India

Multimedia Tools and Applications

1 Introduction Segmentation of handwritten document images into lines and word is a significant problem to solve due to the complication occurs in the handwritten document. The irregular spacing between lines and words, and touching of characters across words and text-lines. Although, many algorithms have been proposed and enormous effort has been dedicated to segmentation of unconstrained handwritten documents into text line and words; still there is plenty of room for improvement. Methods for text-line detection of the printed document have been extensively explored [5, 9]. Segmentation methods in this respect are relatively easy as they have approximately straight with parallel text line, and global projection profile can segment them. But, handwritten documents are usually irregular in spacing and associated with skew and curve. Many attempts have been reported in the literature for solving the challenging task of the handwritten text line and word segmentation. The approaches can be grouped broadly as projection profile analysis [2, 4, 6, 11, 22, 23, 26, 30, 31, 33, 34]