Suffix Tree Construction in RAM
- PDF / 233,334 Bytes
- 4 Pages / 547.087 x 737.008 pts Page_size
- 43 Downloads / 162 Views
technique to identify suffix subranges. Unlike Prepar, DynaCluster does not scan over and over the string S, but it starts from the q-based subranges and then splits them recursively in a DFS-manner if their size is larger than a fixed threshold . Splitting is implemented by looking at the next q characters of the suffixes in the subrange. This clustering and lazy-DFS visit of TS significantly reduce the number of I/Os incurred by the frequent edgesplitting operations that occur during the suffix tree construction process; and allow it to cope efficiently with skew data. As a result, DynaCluster constructs suffix trees for 200Mbps with only 16 Mb internal memory. More recently, [15] improved the space requirement and the buffering efficiency, thus being able to construct a suffix tree of 3 Gbps in 30 hours; whereas [1] improved the I/O behavior of RAM-algorithms for online suffix-tree construction, by devising a novel low-overhead buffering policy.
10. Ko, P., Aluru, S.: Optimal self-adjusting trees for dynamic string data in secondary storage. In: Symposium on String Processing and Information Retrieval (SPIRE). LNCS, vol. 4726, pp. 184-194. Springer, Berlin (2007) 11. Mäkinen, V., Navarro, G.: Dynamic Entropy-Compressed Sequences and Full-Text Indexes. In: Proc. 17th Symposium on Combinatorial Pattern Matching (CPM). LNCS, vol. 4009, pp. 307–318. Springer, Berlin (2006) 12. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22, 935–948 (1993) 13. Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. J. Discret. Algorithms 1, 21–49 (2000) 14. Navarro, G., Mäkinen, V.: Compressed full text indexes. ACM Comput. Surv. 39(1) (2007) 15. Tata, S., Hankins, R.A., Patel, J.M.: Practical suffix tree construction. In: Proc. 13th International Conference on Very Large Data Bases (VLDB), pp. 36–47, Toronto, Canada (2004) 16. Vitter, J.: External memory algorithms and data structures: Dealing with MASSIVE DATA. ACM Comput. Surv. 33, 209–271 (2002)
Cross References
Suffix Tree Construction in RAM
Cache-Oblivious Sorting String Sorting Suffix Array Construction Suffix Tree Construction in RAM Text Indexing
1997; Farach-Colton
Recommended Reading 1. Bedathur, S.J., Haritsa, J.R.: Engineering a fast online persistent suffix tree construction., In: Proc. 20th International Conference on Data Engineering, pp. 720–731, Boston, USA (2004) 2. Cheung, C., Yu, J., Lu, H.: Constructing suffix tree for gigabyte sequences with megabyte memory. IEEE Trans. Knowl. Data Eng. 17, 90–105 (2005) 3. Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. ACM 47 987–1011 (2000) 4. Ferragina, P.: Handbook of Computational Molecular Biology. In: Computer and Information Science Series, ch. 35 on “String search in external memory: algorithms and data structures”. Chapman & Hall/CRC, Florida (2005) 5. Ferragina, P., Grossi, R.: The string B-tree: A new data structure for string search in external memory and its applications. J.
Data Loading...