Bojer-Mur algoritam za pretragu niski

U računarstvu Bojer-Mur algoritam za pretragu niski je efikasan algoritam za pretragu nizova koji je standardni reper za praktično pretraživanje niski u literaturi.^[1] Razvili su ga Robert S. Bojer i J Stroter Mur.^[2] Algoritam pretprocesuje nisku koja se traži (uzorak), ali ne nisku koja se traži (u tekstu). Zbog toga je pogodan za primene gde se tekst ne sastoji od više tretraga ali uzorak da. Bojer-Mur algoritam koristi informacije koje su prikupljene tokom pretprocesiranja da bi preskočio delove teksta, razultirajući manjim stalnim faktorom od mnogih drugih altoritama za niske. U principu, algoritam radi brže dok se dužina uzorka povećava.

Definicije[uredi | uredi izvor]

A	N	P	A	N	M	A	N	-
P	A	N	-	-	-	-	-	-
-	P	A	N	-	-	-	-	-
-	-	P	A	N	-	-	-	-
-	-	-	P	A	N	-	-	-
-	-	-	-	P	A	N	-	-
-	-	-	-	-	P	A	N	-

Poravnanja uzorka PAN u tekstu ANPANMAN,
od k=3 do k=8. Podudaraju se u k=5.

S[i] odnosi se na karakter u indeksu i niske S, počevši od 1.
S[i..j] se odnosi na podnisku niske S počevši od i i završavajući se na j, zaključno.
Prefiks iz S je podniska S[1..i] za neko i u razmaku [1, n], gde je n dužina S.
Sufiks iz S je podniska S[i..n] za neko i u razmaku [1, n], gde je n dužina S.
Niska koja se traži zove se uzorak i zove se simbolom P.
Niska kroz koju se pretražuje zove se tekst i zove se simbolom T.
Dužina P je n.
Dužina T je m.
Poravnanje od P do T je indeks k u T takav da poslednji karakter iz P je poravnat sa indeksom k iz T.
Podudaranje ili pojava P se javlja u vidu usklađivanja ako je P ekvivalentno sa T[(k-n+1)..k].

Opis[uredi | uredi izvor]

Bojer-Mur algoritam traži pojavljivanja P u T vršeći eksplicitno poređenje karaktera na različitim poravnanjima. Umesto pretrage sirovom snagom svih poravnanja (kojih ima m-n+1). Bojer-Mur koristi informacije dobijene iz pretprocesiranja P da preskoči što više poravnanja.

Algoritam počinje kod poravnanja k = n, tako da je početak P je poravnat sa početkom T. Karakteri P i T se onda porede počevši od indeksa n u P i k u T, idući ka dole: niske se uparuju od kraja ka početku od P. Poređenja se nastavljaju sve dok ne dođe ili do neusklađenosti ili se dođe do početka P (što znači da je došlo do poklapanja), posle čega je poravnanje pomereno udesno u skladu sa maksimalnom vrednošću koja je dozvoljena određenim pravilima. Poređenja se ponovo vrše u novom poravnanju, i proces se ponavlja sve dok poravnanje ne prođe kraj T.

Pravila za pomeranje se implementiraju kao konstante tabele za pregled, korišćenjem tabele dobijene tokom pretprocesiranja P.

Pravila za pomeranje[uredi | uredi izvor]

Pravilo lošeg karaktera[uredi | uredi izvor]

Opis[uredi | uredi izvor]

-	-	-	-	X	-	-	K	-	-	-
A	N	P	A	N	M	A	N	A	M	-
-	N	N	A	A	M	A	N	-	-	-
-	-	-	N	N	A	A	M	A	N	-

Demonstracija pravila lošeg
karaktera sa uzorkom NNAAMAN.

Pravilo lošeg karaktera razmatra karakter u T kod kog proces poređenja ne uspe (pretpostavljajući da će do takvog neuspeha doći). Sledeće pojavljivanje tog karaktera levo u P se pronalazi, i promena koju donosi taj događaj u skladu sa nepogođenim događajem u T se predlaže. Ako se nepogođeni karakter ne pojavi levo u P, predlaže se pomeranje koje pomera celo P posle tačke neslaganja.

Pretprocesiranje[uredi | uredi izvor]

Metode variraju u tačnom obliku tabela koje pravilo lošeg karaktera treba da uzme, ali jednostavno rešenje instant-vreme pretrgagom je sledeće: napraviti 2D tabelu koja se indeksira prvo po ideksu karaktera cu alfabetu o drugog po indeksu i u uzorku. Ovo pronalaženje vraća pojavljivanje c' y P sa sledećim najvećim indeksom j < i ili -1 ako nema takvog događaja. Predloženo pomeranje će onda biti i - j, sa O(1) vremenskom složenošću i O(kn) prostornom, pretpostavljajući konačnu dužinu reči dužine k.

Dobro pravilo sufiksa[uredi | uredi izvor]

Opis[uredi | uredi izvor]

-

X

-

K

-

M

A

N

P

A

N

A

M

A

N

A

P

-

A

N

A

M

P

N

A

M

-

A

N

A

M

P

N

A

M

-

Demonstration of good suffix rule with pattern ANAMPNAM.

Dobro pravilo sufiksa je upadljivo složenije u oba koncepta i implementacije od pravila lošeg karaktera. To je razlog zašto poređenja počinju na kraju uzorka radije nego na početku, i formalno se navodi kao:^[3]

Pretpostavimo da za dato poravnanje iz P i T, podstring t iz T odgovara sufiksu iz P, ali do neslaganja dolazi tokom sledećeg poređenja ulevo. Onda pronaći, ukoliko postoji, najdesniju kopiju t' iz t u P tako da t' nije sufiks iz P i karakter levo od t' u P se ralikuje od karaktera levo od t u P. Pomera se P udesno tako da podniska t' u P je ispod podniske t u T. Ako t' ne postoji, onda pomeri ulevo od P posle levog kraja t u T za najmanju veličinu tako da prefiks pomerenog uzorka se podudara sa sufiksom t u T. Ako takvo pomeranje nije moguće, onda pomeri P za n mesta udesno. Ako se pojavi P, onda pomeri P za najmanju veličinu tako da odgovarajući prefiks pomerenog P se podudara sa sufiksom nastanka iz P u T. Ako takvo pomeranje nije moguće, onda pomeri P za n mesta, to jest, pomeri P pored T.

Pretprocesiranje[uredi | uredi izvor]

Dobro pravilo prefiksa zahteva dve tabele: jednu da koristi u generalnom slučaju, i još jednu da se koristi kada kada ili generalni slučaj ne vrati neki značajan rezultat ili dođe do podudaranja. Ove tabele će biti određene L i H odnosno. Njihove definicije su kao što sledi :^[3]

Za svako i, L[i] je najveća pozicija manja od n takva da niska P[i..n] se podudara sa sufiksom od P[1..L[i]] i takva da karakter prethodnog čiji sufiks nije jednak P[i-1]. L[i] je postavljen na nulu ako nema pozicije koja zadovoljava uslov.

Neka H[i] označava dužinu najvećeg sufiksa iz P[i..n] koji je takođe i prefiks iz P, ako on postoji. Ako ne postoje, onda je H[i] postavljen na nulu.

Obe tabele se mogu napraviti tako da budu O(n) vremenska složenost i da koriste O(n) prostornu složenost. Poravnanje pomeranja indeksa i u P je dobijeno iz n - L[i] ili n - H[i]. H bi trebalo samo da se koristi ako je L[i] nula ili je došlo do podudaranja.

Galil pravilo[uredi | uredi izvor]

Jednostavna ali veoma važna implementacija Bojer-Mura, postavio je Galil 1979.^[4] Za razliku od pomeranja, Galil pravilo bavi se ubrzavanjem poređenja tokom svakog poravnanja tako što preskače delove za koje se zna da se podudaraju. Pretpostavimo da kod poravnanja k₁, P se poredi sa T do karaktera c iz T. Onda ako se P pomera do k₂ tako da je levi kraj između c i k₁, je sledeća faza prefiksa iz P mora se podudarati sa podniskom T[(k₂ - n)..k₁]. Tako da ako poređenja dođu do pozicije k₁ of T, pojavljivanje iz P može se zabeležiti bez eksplicitnog proveranja posle k₁. Pored toga što poboljšava efikasnost Bojer-Mura, Galil pravilo uvek mora daje llinearno vreme egzekucije u najgorem slučaju.

Performanse[uredi | uredi izvor]

Bojer-Mur algoritam kako je navedno u originalnom radu ima najgori slučaj vremenske složenosti O(n+m) samo ako se uzorak ne pojavi u tekstu. Ovo su prvi dokazali Knut, Moris, i Prat 1977,^[5] zatim Guiba and Odlizko 1980^[6] sa gornjom granicom od 5m poređenja u najgorem slučaju. Kol je dao dokaz da gornja grainca od 3m poređenja u najgorem slučaju 1991.^[7]

Kada se uzorak pojavi u tekstu, vremenska složenost originalnog algoritma je O(nm) u najgorem slučaju. Ovo je lako videti kada se i uzorak i tekst sastoje samo od istih ponabljajućih karaktera. Međutim, uključenje Galil pravila daje linearnu složenost u svim slučajevima.^[4]^[7]

Implementacije[uredi | uredi izvor]

Razne implementacije postoje u ralzičitim programskim jezicima. U C++-y, Boost obezbeđuje generic Boyer–Moore search implementacije u Algoritmi biblioteci.

Python implementacija[uredi | uredi izvor]

from typing import *
# This version is sensitive to the English alphabet in ASCII for case-insensitive matching.
# To remove this feature, define alphabet_index as ord(c), and replace instances of "26"
# with "256" or any maximum code-point you want. For Unicode you may want to match in UTF-8
# bytes instead of creating a 0x10FFFF-sized table.

ALPHABET_SIZE = 26

def alphabet_index(c: str) -> int:
    """Return the index of the given character in the English alphabet, counting from 0."""
    val = ord(c.lower()) - ord("a")
    assert val >= 0 and val < ALPHABET_SIZE
    return val

def match_length(S: str, idx1: int, idx2: int) -> int:
    """Return the length of the match of the substrings of S beginning at idx1 and idx2."""
    if idx1 == idx2:
        return len(S) - idx1
    match_count = 0
    while idx1 < len(S) and idx2 < len(S) and S[idx1] == S[idx2]:
        match_count += 1
        idx1 += 1
        idx2 += 1
    return match_count

def fundamental_preprocess(S: str) -> List[int]:
    """Return Z, the Fundamental Preprocessing of S.

    Z[i] is the length of the substring beginning at i which is also a prefix of S.
    This pre-processing is done in O(n) time, where n is the length of S.
    """
    if len(S) == 0:  # Handles case of empty string
        return []
    if len(S) == 1:  # Handles case of single-character string
        return [1]
    z = [0 for x in S]
    z[0] = len(S)
    z[1] = match_length(S, 0, 1)
    for i in range(2, 1 + z[1]):  # Optimization from exercise 1-5
        z[i] = z[1] - i + 1
    # Defines lower and upper limits of z-box
    l = 0
    r = 0
    for i in range(2 + z[1], len(S)):
        if i <= r:  # i falls within existing z-box
            k = i - l
            b = z[k]
            a = r - i + 1
            if b < a:  # b ends within existing z-box
                z[i] = b
            else:  # b ends at or after the end of the z-box, we need to do an explicit match to the right of the z-box
                z[i] = a + match_length(S, a, r + 1)
                l = i
                r = i + z[i] - 1
        else:  # i does not reside within existing z-box
            z[i] = match_length(S, 0, i)
            if z[i] > 0:
                l = i
                r = i + z[i] - 1
    return z

def bad_character_table(S: str) -> List[List[int]]:
    """
    Generates R for S, which is an array indexed by the position of some character c in the
    English alphabet. At that index in R is an array of length |S|+1, specifying for each
    index i in S (plus the index after S) the next location of character c encountered when
    traversing S from right to left starting at i. This is used for a constant-time lookup
    for the bad character rule in the Boyer-Moore string search algorithm, although it has
    a much larger size than non-constant-time solutions.
    """
    if len(S) == 0:
        return [[] for a in range(ALPHABET_SIZE)]
    R = [[-1] for a in range(ALPHABET_SIZE)]
    alpha = [-1 for a in range(ALPHABET_SIZE)]
    for i, c in enumerate(S):
        alpha[alphabet_index(c)] = i
        for j, a in enumerate(alpha):
            R[j].append(a)
    return R

def good_suffix_table(S: str) -> List[int]:
    """
    Generates L for S, an array used in the implementation of the strong good suffix rule.
    L[i] = k, the largest position in S such that S[i:] (the suffix of S starting at i) matches
    a suffix of S[:k] (a substring in S ending at k). Used in Boyer-Moore, L gives an amount to
    shift P relative to T such that no instances of P in T are skipped and a suffix of P[:L[i]]
    matches the substring of T matched by a suffix of P in the previous match attempt.
    Specifically, if the mismatch took place at position i-1 in P, the shift magnitude is given
    by the equation len(P) - L[i]. In the case that L[i] = -1, the full shift table is used.
    Since only proper suffixes matter, L[0] = -1.
    """
    L = [-1 for c in S]
    N = fundamental_preprocess(S[::-1])  # S[::-1] reverses S
    N.reverse()
    for j in range(0, len(S) - 1):
        i = len(S) - N[j]
        if i != len(S):
            L[i] = j
    return L

def full_shift_table(S: str) -> List[int]:
    """
    Generates F for S, an array used in a special case of the good suffix rule in the Boyer-Moore
    string search algorithm. F[i] is the length of the longest suffix of S[i:] that is also a
    prefix of S. In the cases it is used, the shift magnitude of the pattern P relative to the
    text T is len(P) - F[i] for a mismatch occurring at i-1.
    """
    F = [0 for c in S]
    Z = fundamental_preprocess(S)
    longest = 0
    for i, zv in enumerate(reversed(Z)):
        longest = max(zv, longest) if zv == i + 1 else longest
        F[-i - 1] = longest
    return F

def string_search(P, T) -> List[int]:
    """
    Implementation of the Boyer-Moore string search algorithm. This finds all occurrences of P
    in T, and incorporates numerous ways of pre-processing the pattern to determine the optimal
    amount to shift the string and skip comparisons. In practice it runs in O(m) (and even
    sublinear) time, where m is the length of T. This implementation performs a case-insensitive
    search on ASCII alphabetic characters, spaces not included.
    """
    if len(P) == 0 or len(T) == 0 or len(T) < len(P):
        return []

    matches = []

    # Preprocessing
    R = bad_character_table(P)
    L = good_suffix_table(P)
    F = full_shift_table(P)

    k = len(P) - 1      # Represents alignment of end of P relative to T
    previous_k = -1     # Represents alignment in previous phase (Galil's rule)
    while k < len(T):
        i = len(P) - 1  # Character to compare in P
        h = k           # Character to compare in T
        while i >= 0 and h > previous_k and P[i] == T[h]:  # Matches starting from end of P
            i -= 1
            h -= 1
        if i == -1 or h == previous_k:  # Match has been found (Galil's rule)
            matches.append(k - len(P) + 1)
            k += len(P) - F[1] if len(P) > 1 else 1
        else:  # No match, shift by max of bad character and good suffix rules
            char_shift = i - R[alphabet_index(T[h])][i]
            if i + 1 == len(P):  # Mismatch happened on first attempt
                suffix_shift = 1
            elif L[i + 1] == -1:  # Matched suffix does not appear anywhere in P
                suffix_shift = len(P) - F[i + 1]
            else:               # Matched suffix appears in P
                suffix_shift = len(P) - 1 - L[i + 1]
            shift = max(char_shift, suffix_shift)
            previous_k = k if shift >= i + 1 else previous_k  # Galil's rule
            k += shift
    return matches

C implementacija[uredi | uredi izvor]

#include <stdint.h>
#include <stddef.h>
#include <stdbool.h>
#include <stdlib.h>

#define ALPHABET_LEN 256
#define max(a, b) ((a < b) ? b : a)

// BAD CHARACTER RULE.
// delta1 table: delta1[c] contains the distance between the last
// character of pat and the rightmost occurrence of c in pat.
//
// If c does not occur in pat, then delta1[c] = patlen.
// If c is at string[i] and c != pat[patlen-1], we can safely shift i
//   over by delta1[c], which is the minimum distance needed to shift
//   pat forward to get string[i] lined up with some character in pat.
// c == pat[patlen-1] returning zero is only a concern for BMH, which
//   does not have delta2. BMH makes the value patlen in such a case.
//   We follow this choice instead of the original 0 because it skips
//   more. (correctness?)
//
// This algorithm runs in alphabet_len+patlen time.
void make_delta1(ptrdiff_t *delta1, uint8_t *pat, size_t patlen) {
    for (int i=0; i < ALPHABET_LEN; i++) {
        delta1[i] = patlen;
    }
    for (int i=0; i < patlen-2; i++) {
        delta1[pat[i]] = patlen-1 - i;
    }
}

// true if the suffix of word starting from word[pos] is a prefix
// of word
bool is_prefix(uint8_t *word, size_t wordlen, ptrdiff_t pos) {
    int suffixlen = wordlen - pos;
    // could also use the strncmp() library function here
    // return ! strncmp(word, &word[pos], suffixlen);
    for (int i = 0; i < suffixlen; i++) {
        if (word[i] != word[pos+i]) {
            return false;
        }
    }
    return true;
}

// length of the longest suffix of word ending on word[pos].
// suffix_length("dddbcabc", 8, 4) = 2
size_t suffix_length(uint8_t *word, size_t wordlen, ptrdiff_t pos) {
    size_t i;
    // increment suffix length i to the first mismatch or beginning
    // of the word
    for (i = 0; (word[pos-i] == word[wordlen-1-i]) && (i < pos); i++);
    return i;
}

// GOOD SUFFIX RULE.
// delta2 table: given a mismatch at pat[pos], we want to align
// with the next possible full match could be based on what we
// know about pat[pos+1] to pat[patlen-1].
//
// In case 1:
// pat[pos+1] to pat[patlen-1] does not occur elsewhere in pat,
// the next plausible match starts at or after the mismatch.
// If, within the substring pat[pos+1 .. patlen-1], lies a prefix
// of pat, the next plausible match is here (if there are multiple
// prefixes in the substring, pick the longest). Otherwise, the
// next plausible match starts past the character aligned with
// pat[patlen-1].
//
// In case 2:
// pat[pos+1] to pat[patlen-1] does occur elsewhere in pat. The
// mismatch tells us that we are not looking at the end of a match.
// We may, however, be looking at the middle of a match.
//
// The first loop, which takes care of case 1, is analogous to
// the KMP table, adapted for a 'backwards' scan order with the
// additional restriction that the substrings it considers as
// potential prefixes are all suffixes. In the worst case scenario
// pat consists of the same letter repeated, so every suffix is
// a prefix. This loop alone is not sufficient, however:
// Suppose that pat is "ABYXCDBYX", and text is ".....ABYXCDEYX".
// We will match X, Y, and find B != E. There is no prefix of pat
// in the suffix "YX", so the first loop tells us to skip forward
// by 9 characters.
// Although superficially similar to the KMP table, the KMP table
// relies on information about the beginning of the partial match
// that the BM algorithm does not have.
//
// The second loop addresses case 2. Since suffix_length may not be
// unique, we want to take the minimum value, which will tell us
// how far away the closest potential match is.
void make_delta2(ptrdiff_t *delta2, uint8_t *pat, size_t patlen) {
    ssize_t p;
    size_t last_prefix_index = patlen-1;

    // first loop
    for (p=patlen-1; p>=0; p--) {
        if (is_prefix(pat, patlen, p+1)) {
            last_prefix_index = p+1;
        }
        delta2[p] = last_prefix_index + (patlen-1 - p);
    }

    // second loop
    for (p=0; p < patlen-1; p++) {
        size_t slen = suffix_length(pat, patlen, p);
        if (pat[p - slen] != pat[patlen-1 - slen]) {
            delta2[patlen-1 - slen] = patlen-1 - p + slen;
        }
    }
}

// Returns pointer to first match.
// See also glibc memmem() (non-BM) and std::boyer_moore_searcher (first-match).
uint8_t* boyer_moore (uint8_t *string, size_t stringlen, uint8_t *pat, size_t patlen) {
    ptrdiff_t delta1[ALPHABET_LEN];
    ptrdiff_t delta2[patlen]; // C99 VLA
    make_delta1(delta1, pat, patlen);
    make_delta2(delta2, pat, patlen);

    // The empty pattern must be considered specially
    if (patlen == 0) {
        return string;
    }

    size_t i = patlen - 1;        // str-idx
    while (i < stringlen) {
        ptrdiff_t j = patlen - 1; // pat-idx
        while (j >= 0 && (string[i] == pat[j])) {
            --i;
            --j;
        }
        if (j < 0) {
            return &string[i+1];
        }

        ptrdiff_t shift = max(delta1[string[i]], delta2[j]);
        i += shift;
    }
    return NULL;
}

Java implementacija[uredi | uredi izvor]

    /**
     * Returns the index within this string of the first occurrence of the
     * specified substring. If it is not a substring, return -1.
     *
     * There is no Galil because it only generates one match.
     *
     * @param haystack The string to be scanned
     * @param needle The target string to search
     * @return The start index of the substring
     */
    public static int indexOf(char[] haystack, char[] needle) {
        if (needle.length == 0) {
            return 0;
        }
        int charTable[] = makeCharTable(needle);
        int offsetTable[] = makeOffsetTable(needle);
        for (int i = needle.length - 1, j; i < haystack.length;) {
            for (j = needle.length - 1; needle[j] == haystack[i]; --i, --j) {
                if (j == 0) {
                    return i;
                }
            }
            // i += needle.length - j; // For naive method
            i += Math.max(offsetTable[needle.length - 1 - j], charTable[haystack[i]]);
        }
        return -1;
    }

    /**
     * Makes the jump table based on the mismatched character information.
     */
    private static int[] makeCharTable(char[] needle) {
        final int ALPHABET_SIZE = Character.MAX_VALUE + 1; // 65536
        int[] table = new int[ALPHABET_SIZE];
        for (int i = 0; i < table.length; ++i) {
            table[i] = needle.length;
        }
        for (int i = 0; i < needle.length - 1; ++i) {
            table[needle[i]] = needle.length - 1 - i;
        }
        return table;
    }

    /**
     * Makes the jump table based on the scan offset which mismatch occurs.
     * (bad character rule).
     */
    private static int[] makeOffsetTable(char[] needle) {
        int[] table = new int[needle.length];
        int lastPrefixPosition = needle.length;
        for (int i = needle.length; i > 0; --i) {
            if (isPrefix(needle, i)) {
                lastPrefixPosition = i;
            }
            table[needle.length - i] = lastPrefixPosition - i + needle.length;
        }
        for (int i = 0; i < needle.length - 1; ++i) {
            int slen = suffixLength(needle, i);
            table[slen] = needle.length - 1 - i + slen;
        }
        return table;
    }

    /**
     * Is needle[p:end] a prefix of needle?
     */
    private static boolean isPrefix(char[] needle, int p) {
        for (int i = p, j = 0; i < needle.length; ++i, ++j) {
            if (needle[i] != needle[j]) {
                return false;
            }
        }
        return true;
    }

    /**
     * Returns the maximum length of the substring ends at p and is a suffix.
     * (good suffix rule)
     */
    private static int suffixLength(char[] needle, int p) {
        int len = 0;
        for (int i = p, j = needle.length - 1;
                 i >= 0 && needle[i] == needle[j]; --i, --j) {
            len += 1;
        }
        return len;
    }

Varijante[uredi | uredi izvor]

Bojer-Mur-Horspul algoritam je uprošćavanje Bojer-Mur algoritma samo koristeći pravilo lošeg karaktera.

Apostoliko-Đijankarlov algoritam ubrzava proces proveravanja da li je do podudaranja došlo kod odrešenog poravnanja preskačući ekplicitne provere karaktera. Ovo koristi informacije sakupljene tokom pretprocesiranja uzorka u spajanju sa sufkisom se podudaraca sa dužinom zabeleženom pri svakom pokušaju podudaranja. Čuvanje dužina pogođenih sufiksa zahteva dodatnu tabelu koja je jednaka veličini teksta koji se pretražuje.

Vidi još[uredi | uredi izvor]

Reference[uredi | uredi izvor]

^ Hume and Sunday (1991) [Fast String Searching] SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 21(11), 1221–1248 (NOVEMBER 1991)
^ Boyer, Robert S.; Moore, J Strother (oktobar 1977). „A Fast String Searching Algorithm.”. Comm. ACM. New York, NY, USA: Association for Computing Machinery. 20 (10): 762—772. ISSN 0001-0782. doi:10.1145/359842.359859.
^ ^a ^b Gusfield, Dan (1999) [1997], „Chapter 2 - Exact Matching: Classical Comparison-Based Methods”, Algorithms on Strings, Trees, and Sequences (1 izd.), Cambridge University Press, str. 19—21, ISBN 978-0-521-58519-4
^ ^a ^b Galil, Z. (septembar 1979). „On improving the worst case running time of the Boyer-Moore string matching algorithm”. Comm. ACM. New York, NY, USA: Association for Computing Machinery. 22 (9): 505—508. ISSN 0001-0782. doi:10.1145/359146.359148.
^ Knuth, Donald; Morris, James H.; Pratt, Vaughan (1977). „Fast pattern matching in strings”. SIAM Journal on Computing. 6 (2): 323—350. doi:10.1137/0206024. Arhivirano iz originala 04. 01. 2010. g. Pristupljeno 30. 05. 2013.
^ Guibas, Odlyzko; Odlyzko, Andrew (1977). „A new proof of the linearity of the Boyer-Moore string searching algorithm”. Proceedings of the 18th Annual Symposium on Foundations of Computer Science. Washington, DC, USA: IEEE Computer Society: 189—195. doi:10.1109/SFCS.1977.3.
^ ^a ^b Cole, Richard (septembar 1991). „Tight bounds on the complexity of the Boyer-Moore string matching algorithm”. Proceedings of the 2nd annual ACM-SIAM symposium on Discrete algorithms. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics: 224—233. ISBN 978-0-89791-376-8.

Spoljašnje veze[uredi | uredi izvor]

Original paper on the Boyer-Moore algorithm
Primer Bojer-Mur algoritma sa stranice J Strother Moore, jednog od pronalača algoritma
Richard Cole's 1991 paper proving runtime linearity

[1] Hume and Sunday (1991) [Fast String Searching] SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 21(11), 1221–1248 (NOVEMBER 1991)

[original-2] Boyer, Robert S.; Moore, J Strother (oktobar 1977). „A Fast String Searching Algorithm.”. Comm. ACM. New York, NY, USA: Association for Computing Machinery. 20 (10): 762—772. ISSN 0001-0782. doi:10.1145/359842.359859.

[ASTS-3] Gusfield, Dan (1999) [1997], „Chapter 2 - Exact Matching: Classical Comparison-Based Methods”, Algorithms on Strings, Trees, and Sequences (1 izd.), Cambridge University Press, str. 19—21, ISBN 978-0-521-58519-4

[galill-4] Galil, Z. (septembar 1979). „On improving the worst case running time of the Boyer-Moore string matching algorithm”. Comm. ACM. New York, NY, USA: Association for Computing Machinery. 22 (9): 505—508. ISSN 0001-0782. doi:10.1145/359146.359148.

[kmp-5] Knuth, Donald; Morris, James H.; Pratt, Vaughan (1977). „Fast pattern matching in strings”. SIAM Journal on Computing. 6 (2): 323—350. doi:10.1137/0206024. Arhivirano iz originala 04. 01. 2010. g. Pristupljeno 30. 05. 2013.

[go-6] Guibas, Odlyzko; Odlyzko, Andrew (1977). „A new proof of the linearity of the Boyer-Moore string searching algorithm”. Proceedings of the 18th Annual Symposium on Foundations of Computer Science. Washington, DC, USA: IEEE Computer Society: 189—195. doi:10.1109/SFCS.1977.3.

[cole-7] Cole, Richard (septembar 1991). „Tight bounds on the complexity of the Boyer-Moore string matching algorithm”. Proceedings of the 2nd annual ACM-SIAM symposium on Discrete algorithms. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics: 224—233. ISBN 978-0-89791-376-8.

[1]

[2]

[3]

[4]

[5]

[6]

[7]