Demystifying Algorithms: Rabin-Karp
Philip Thomas
Posted on November 24, 2024
What is Rabin-Karp?
Rabin-Karp is an efficient string-searching algorithm that uses hashing to find patterns within a text. By comparing hash values of substrings, it avoids character-by-character comparisons in most cases. This approach is particularly effective for multiple pattern searches, where the algorithm computes hashes for all patterns and checks them against a text efficiently.
The key strength of Rabin-Karp lies in its use of a rolling hash function, which allows for rapid recomputation of hash values for overlapping substrings. However, its performance can degrade if hash collisions occur frequently, requiring additional character comparisons.
The Technical View
-
Time Complexity:
- Best Case: (O(n + m)), where (n) is the text length and (m) is the pattern length. Hash comparisons dominate, avoiding character-by-character scans.
- Worst Case: (O(n \cdot m)), if hash collisions occur frequently, requiring repeated character comparisons.
-
Space Complexity: (O(1)).
- Minimal additional memory is required to store hash values.
How Rabin-Karp Works
The algorithm operates in two main phases:
-
Hash Calculation:
- Compute the hash for the pattern.
- Compute the hash for the first substring (window) in the text of the same length as the pattern.
-
Sliding Window and Matching:
- Slide the window one character at a time through the text.
- Update the hash using a rolling hash function to avoid recomputing from scratch.
- Compare the hash of the current window with the pattern hash. If they match, verify the actual characters to confirm the match (to handle hash collisions).
A Fifth-Grader's Summary
Imagine you’re looking for your friend’s handwriting in a stack of papers. Instead of reading every word, you first check the paper’s overall look (its hash). If it looks similar, you take a closer look to confirm. Rabin-Karp lets you skim through the papers efficiently!
Real-World Example
Consider searching for DNA sequences (patterns) in a genome (text). Rabin-Karp’s ability to compute hashes for multiple patterns and compare them efficiently makes it an excellent choice for this task.
Examples with Code, Detailed Iterations, and Optimized Patterns
1. Single Pattern Search
Problem: Find the first occurrence of a pattern in a text using Rabin-Karp.
Code:
public static int RabinKarp(string text, string pattern, int prime = 101)
{
int m = pattern.Length;
int n = text.Length;
int patternHash = 0, windowHash = 0, h = 1;
// Edge case: Pattern length > text length
if (m > n)
{
return -1; // Pattern cannot exist in the text
}
// Compute h = pow(256, m-1) % prime
for (int i = 0; i < m - 1; i++)
h = (h * 256) % prime;
// Compute the hash value for the pattern
for (int i = 0; i < m; i++)
{
patternHash = (256 * patternHash + pattern[i]) % prime;
}
// Compute the hash value for the first window of the text
for (int i = 0; i < m && i < n; i++)
{
windowHash = (256 * windowHash + text[i]) % prime;
}
// Slide the window across the text
for (int i = 0; i <= n - m; i++)
{
// Compare hash values
if (patternHash == windowHash)
{
// Confirm by comparing actual characters to handle hash collisions
if (text.Substring(i, m) == pattern)
return i; // Pattern found
}
// Compute the hash for the next window
if (i < n - m)
{
windowHash = (256 * (windowHash - text[i] * h) + text[i + m]) % prime;
// Ensure non-negative hash values
if (windowHash < 0)
windowHash += prime;
}
}
return -1; // Pattern not found
}
// Example Usage
string text = "ababcababc";
string pattern = "abc";
Console.WriteLine(RabinKarp(text, pattern)); // Output: 2
What Happens in Each Iteration:
-
Hash Initialization:
- Compute
patternHash
for "abc": (25643). - Compute
windowHash
for "aba": (25642).
- Compute
-
First Window Comparison:
- (windowHash \neq patternHash). Slide the window.
-
Second Window Comparison:
- Update
windowHash
to (25643). (windowHash = patternHash). - Verify actual characters: "abc" matches at index (2).
- Update
-
Subsequent Comparisons:
- Continue sliding the window until the end of the text.
2. Multiple Pattern Search
Problem: Find occurrences of multiple patterns in a text.
Code:
public static List<int> RabinKarpMulti(string text, List<string> patterns, int prime = 101)
{
var result = new List<int>();
foreach (var pattern in patterns)
{
int index = RabinKarp(text, pattern, prime);
result.Add(index);
}
return result;
}
// Example Usage
string text = "ababcababc";
List<string> patterns = new List<string> { "abc", "ab" };
List<int> results = RabinKarpMulti(text, patterns);
Console.WriteLine(string.Join(", ", results)); // Output: 2, 0
What Happens:
- Compute the hash for each pattern ("abc", "ab").
- Use the Rabin-Karp function for each pattern, returning their respective indices.
3. Rolling Hash Demonstration
Problem: Demonstrate the efficiency of the rolling hash.
Code:
public static void RollingHashDemo()
{
string text = "abcd";
int prime = 101;
int windowHash = 0, h = 1;
// Compute h = pow(256, m-1) % prime
for (int i = 0; i < 2; i++) h = (h * 256) % prime;
// Compute initial hash for "ab"
windowHash = (256 * 'a' + 'b') % prime;
Console.WriteLine($"Initial Hash: {windowHash}");
// Slide the window to "bc"
windowHash = (256 * (windowHash - 'a' * h) + 'c') % prime;
if (windowHash < 0) windowHash += prime;
Console.WriteLine($"New Hash: {windowHash}");
}
// Example Usage
RollingHashDemo();
// Output:
// Initial Hash: 84
// New Hash: 38
What Happens in Each Iteration:
- Compute initial hash for "ab": (84).
- Update hash for "bc" using the rolling hash: (38).
Conclusion
The Rabin-Karp algorithm shines in scenarios where multiple pattern searches are required or when efficient hashing can minimize character comparisons. With the corrected implementation, hash calculations for patterns and text windows are now separated, ensuring clarity and robustness.
Mastering Rabin-Karp provides a strong foundation for understanding hashing in algorithm design and prepares you for tackling complex pattern-matching challenges!
Posted on November 24, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.