Human Reading Based Strategies for Off-Line Arabic Word Recognition

This paper summarizes techniques proposed for off-line Arabic word recognition. This point of view concerns the human reading favoring an interactive mechanism between global memorization and local verification sim- plifying the recognition of complex scr

  • PDF / 721,270 Bytes
  • 21 Pages / 430 x 660 pts Page_size
  • 102 Downloads / 197 Views

DOWNLOAD

REPORT


LORIA, rue du jardin botanique, 54602 Villers-Lès-Nancy, France [email protected] 2 ITESOFT Aimargues, Parc d'Andron - Immeuble Le Séquoia, 30470 Aimargues, France [email protected] Abstract. This paper summarizes techniques proposed for off-line Arabic word recognition. This point of view concerns the human reading favoring an interactive mechanism between global memorization and local verification sim- plifying the recognition of complex scripts such as Arabic. According to this consideration, specific papers are analyzed with comments on strategies.

1 Introduction Concerning Arabic recognition, the literature proposes several surveys that consider different points of view: − By stressing the multiplication of the source of information, from a simple classifier to a combination, with simple or hybrid choices of the primitives, as described by Essoukri Ben Amara and Bouslama [1]. − By considering the nature of the script: printed or handwritten, its recognition engines and its applications, like in Lorigo and Govindaraju [2]. − By describing the method nature: symbolic or numeric, as made by Amin [3]. We propose another survey based on the functioning of the human perception spectrum from coarse to fine (i.e. local, analytical or precise). This kind of perception makes it possible to better justify the choice of observations, to order them in classifier cascades, and to propose solutions in case of conflict or problem, and gives sense to the entire chain of recognition.

2 Human Perception of Arabic Writing Arabic is a calligraphic language. It provides a global rendering of the whole word, and the detail of the letter is often thinned, crushed, sketched so it contributes to the embellishment of the unit (see Figure 1).

Fig. 1. Same word written with different possible elongations as described in [1] D.S. Doermann and S. Jaeger (Eds.): SACH 2006, LNCS 4768, pp. 36–56, 2008. © Springer-Verlag Berlin Heidelberg 2008

Human Reading Based Strategies for Off-Line Arabic Word Recognition

37

Thus, the letter can assume one to four different forms according to its position in the word. The global form becomes the recognized one and the letter passes in the second plan, favoring the total appearance (see Figure 2). Consequently, a bigger alphabet now contains approximately 100 possible forms [1].

Fig. 2. Examples of style fonts of Arabic as described in [4]

However, to facilitate calligraphic reading, diacritics and accents take priority when deciphering letters which have similar base shapes. Second, in order to not force the writer to continue to maintain contact between pen and paper, Arabic offers a decomposition in PAW (Part of Arabic Word), which introduces pauses in the writing that influence the recognition process. The PAWs simplify the script apprehension and simplify the linear recognition. Figure 3 gives an example of Arabic’s complexity, with sub-words and diacritic information.

Fig. 3. Arabic writing complexity: example of a handwritten word as shown in [28]

Considering the reading process