Sequential Multiple String Matching

  • PDF / 253,895 Bytes
  • 4 Pages / 547.087 x 737.008 pts Page_size
  • 18 Downloads / 233 Views

DOWNLOAD

REPORT


S

Sequential Multiple String Matching

siders sparse q-grams and thus avoids to scan a lot of text positions. It is due to Fredriksson and Grabowski [7]. Applications The methods which are described here apply to the treatment of the natural language, the treatment and analysis of genetic sequences and of musical sequences, the problems of safety related to data flows like virus detection, and the management of textual data bases, to quote only some immediate applications. Open Problems There remain only a few open problems on this question. It is still unknown if it is possible to design an average optimal time constant space string matching algorithm. The exact size of the Boyer-Moore automaton is still unknown (see [5]). Experimental Results The book of G. Navarro and M. Raffinot [12] is a good introduction and presents an experimental map of ESM algorithms for different alphabet sizes and pattern lengths. Basically, the Shift-Or algorithm is efficient for small alphabets and short patterns, the BNDM algorithm is efficient for medium size alphabets and medium length patterns, the Horspool algorithm is efficient for large alphabets, and the BOM algorithm is efficient for long patterns. URL to Code The site monge.univ-mlv.fr/~lecroq/string presents a large number of ESM algorithms (see also [2]). Each algorithm is implemented in C code and a Java applet is given. Cross References  Indexed approximate string matching refers to the case where the text is preprocessed;  Regular expression matching is the more complex case where P can be a regular expression.  Sequential approximate string matching is the version where errors are permitted;  Sequential multiple string matching is the version where a finite set of patterns is searched in a text; Recommended Reading 1. Allauzen, C., Crochemore, M., Raffinot, M.: Factor oracle: a new structure for pattern matching. In: SOFSEM’99. LNCS, vol. 1725, pp. 291–306. Springer, Berlin (1999)

2. Charras, C., Lecroq, T.: Handbook of exact string matching algorithms. King’s College London Publications, London (2004) 3. Cole, R., Hariharan, R., Paterson, M., Zwick, U.: Tighter lower bounds on the exact complexity of string matching. SIAM J. Comput. 24(1), 30–45 (1995) 4. Crochemore, M., Czumaj, A., Gasieniec, ˛ L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding up two string matching algorithms. Algorithmica 12(4/5), 247–267 (1994) 5. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on strings. Cambridge University Press, New York (2007) 6. Crochemore, M., Perrin, D.: Two-way string matching. J. ACM 38(3), 651–675 (1991) 7. Fredriksson, K., Grabowski, S.: Practical and optimal string matching. In: Proceedings of SPIRE’2005. LNCS, vol. 3772, pp. 374–385. Springer, Berlin (2005) 8. Galil, Z., Seiferas, J.: Time-space optimal string matching. J. Comput. Syst. Sci. 26(3), 280–294 (1983) 9. Gusfield, D.: Algorithms on strings, trees and sequences. Cambridge University Press, Cambridge, UK (1997) 10. Hancart, C.: On Simon’s string searching algorithm. Inf. Process. Lett. 47(2)