MARK C. answered 8d
Computer Programming Tutor | C++/Python/Data Structures | 20 Years Exp
Direction of Comparison:
KMP: Scans the pattern left-to-right
- Compares characters from left to right in the pattern
Boyer-Moore: Scans the pattern right-to-left
- Compares characters from right to left in the pattern
Preprocessing:
KMP: Builds a "failure function" or "prefix table"
- Analyzes the pattern to find longest proper prefix that is also a suffix
- Used to avoid re-comparing characters we already know match
Boyer-Moore: Uses two rules with lookup tables
- Bad character rule: table of last occurrence of each character in pattern
- Good suffix rule: table based on matching suffixes
Skip Logic:
KMP: Never moves backward in the text
- When mismatch occurs, shifts pattern based on prefix table
- Skips ahead but typically by smaller amounts
- Guarantees each text character examined at most once
Boyer-Moore: Can skip large portions of text
- Uses bad character rule to skip when text character not in pattern
- Can skip multiple characters at once
- More aggressive skipping, especially with large alphabets
Performance:
KMP:
- Worst case: O(n + m) where n = text length, m = pattern length
- Good for small alphabets
- Simpler to implement
Boyer-Moore:
- Worst case: O(nm) but rarely occurs
- Average case: O(n/m) - sublinear, very fast in practice
- Best for large alphabets and longer patterns
- More complex to implement
Practical difference:
Boyer-Moore is generally faster in practice because it can skip more characters, especially when searching English text or DNA sequences. KMP has better worst-case guarantees and is simpler to understand and code.