Documente online.
Zona de administrare documente. Fisierele tale
Am uitat parola x Creaza cont nou
 HomeExploreaza
upload
Upload




A suffix array

computers


A suffix array is an index structure for locating substrings in a larger string. It is an array of indexes giving the lexicographical order of the suffixes. A suffix array for a string S of length n can be built in Θ(n) time, and lets you find all occ occurrences of a pattern P of length m in S in O(m + log n + occ) time. Suffix arrays were originally developed by Gene Myers and Udi Manber to reduce memory consum 141o141b ption compared to a suffix tree. This began the trend towards compressed suffix arrays and BWT-based compressed full-text indices.



The kth suffix of a string S is S with the first k-1 characters removed, for some starting position k. A string of n characters has n suffixes, denoted by their starting positions 1..n. For instance, the string "abracadabra" has the following suffixes:

abracadabra
bracadabra
racadabra
acadabra
cadabra
adabra
dabra
abra
bra
ra
a

The first step in building a suffix array is to sort the suffixes lexicographically, giving:

11 a
8 abra
abracadabra
acadabra
6 adabra
9 bra
2 bracadabra
5 cadabra
7 dabra
ra
3 racadabra

To find where a given pattern P is a substring of S, two binary searches are used to find the range of suffixes prefixed by P. The output of the search is the list of starting positions in this range. A search for "br" in the given example would give a left border 6 and a right border 7, giving suffixes 9 and 2. This means that "br" is a substring of "abracadabra" both at position 2 and 9. If implemented straightforward, this binary search takes O(m log n) time, as most of the pattern P is compared at every step in the binary search in the worst case. To avoid redoing comparisons, extra data structures giving information about the longest common prefixes (LCP's) of suffixes are constructed, giving O(m + log n) search time.

The key insight of the suffix array is to denote each suffix by its starting position only (the second column above). The resulting array of numbers, combined with the original string, is a compact representation of the sorted suffix list, consuming one character and one integer for each character in the string.

There are many suffix array construction algorithms, with different properties. Some O(n2) construction algorithms are faster than the Θ(n) algorithms in practise.

[edit]

References

  • Udi Manber and Gene Myers (1991). "Suffix arrays: a new method for on-line string searches". SIAM Journal on Computing, Volume 22, Issue 5 (October 1993), pp. 935-948.
  • Pang Ko and Srinivas Aluru (2003). "Space efficient linear time construction of suffix arrays." In Combinatorial Pattern Matching (CPM 03). LNCS 2676, Springer, 2003, pp 203-210.
  • Juha Kärkkäinen and Peter Sanders (2003). "Simple linear work suffix array construction." In Proc. 30th International Colloquium on Automata, Languages and Programming (ICALP '03). LNCS 2719, Springer, 2003, pp. 943-955.
  • Klaus-Bernd Schürmann and Jens Stoye (2005). "An incomplex algorithm for fast suffix array construction". In Proceedings of ALENEX, 2005.

Retrieved from "https://en.wikipedia.org/wiki/Suffix_array"


Document Info


Accesari: 1402
Apreciat: hand-up

Comenteaza documentul:

Nu esti inregistrat
Trebuie sa fii utilizator inregistrat pentru a putea comenta


Creaza cont nou

A fost util?

Daca documentul a fost util si crezi ca merita
sa adaugi un link catre el la tine in site


in pagina web a site-ului tau.




eCoduri.com - coduri postale, contabile, CAEN sau bancare

Politica de confidentialitate | Termenii si conditii de utilizare




Copyright © Contact (SCRIGROUP Int. 2024 )