Demystifying Algorithms: Rabin-Karp

craftedwithintent

Philip Thomas

Posted on November 24, 2024

Demystifying Algorithms: Rabin-Karp

What is Rabin-Karp?

Rabin-Karp is an efficient string-searching algorithm that uses hashing to find patterns within a text. By comparing hash values of substrings, it avoids character-by-character comparisons in most cases. This approach is particularly effective for multiple pattern searches, where the algorithm computes hashes for all patterns and checks them against a text efficiently.

The key strength of Rabin-Karp lies in its use of a rolling hash function, which allows for rapid recomputation of hash values for overlapping substrings. However, its performance can degrade if hash collisions occur frequently, requiring additional character comparisons.


The Technical View

  • Time Complexity:

    • Best Case: (O(n + m)), where (n) is the text length and (m) is the pattern length. Hash comparisons dominate, avoiding character-by-character scans.
    • Worst Case: (O(n \cdot m)), if hash collisions occur frequently, requiring repeated character comparisons.
  • Space Complexity: (O(1)).

    • Minimal additional memory is required to store hash values.

How Rabin-Karp Works

The algorithm operates in two main phases:

  1. Hash Calculation:
    • Compute the hash for the pattern.
    • Compute the hash for the first substring (window) in the text of the same length as the pattern.
  2. Sliding Window and Matching:
    • Slide the window one character at a time through the text.
    • Update the hash using a rolling hash function to avoid recomputing from scratch.
    • Compare the hash of the current window with the pattern hash. If they match, verify the actual characters to confirm the match (to handle hash collisions).

A Fifth-Grader's Summary

Imagine you’re looking for your friend’s handwriting in a stack of papers. Instead of reading every word, you first check the paper’s overall look (its hash). If it looks similar, you take a closer look to confirm. Rabin-Karp lets you skim through the papers efficiently!


Real-World Example

Consider searching for DNA sequences (patterns) in a genome (text). Rabin-Karp’s ability to compute hashes for multiple patterns and compare them efficiently makes it an excellent choice for this task.


Examples with Code, Detailed Iterations, and Optimized Patterns


1. Single Pattern Search

Problem: Find the first occurrence of a pattern in a text using Rabin-Karp.

Code:

public static int RabinKarp(string text, string pattern, int prime = 101)
{
    int m = pattern.Length;
    int n = text.Length;
    int patternHash = 0, windowHash = 0, h = 1;

    // Edge case: Pattern length > text length
    if (m > n)
    {
        return -1; // Pattern cannot exist in the text
    }

    // Compute h = pow(256, m-1) % prime
    for (int i = 0; i < m - 1; i++)
        h = (h * 256) % prime;

    // Compute the hash value for the pattern
    for (int i = 0; i < m; i++)
    {
        patternHash = (256 * patternHash + pattern[i]) % prime;
    }

    // Compute the hash value for the first window of the text
    for (int i = 0; i < m && i < n; i++)
    {
        windowHash = (256 * windowHash + text[i]) % prime;
    }

    // Slide the window across the text
    for (int i = 0; i <= n - m; i++)
    {
        // Compare hash values
        if (patternHash == windowHash)
        {
            // Confirm by comparing actual characters to handle hash collisions
            if (text.Substring(i, m) == pattern)
                return i; // Pattern found
        }

        // Compute the hash for the next window
        if (i < n - m)
        {
            windowHash = (256 * (windowHash - text[i] * h) + text[i + m]) % prime;

            // Ensure non-negative hash values
            if (windowHash < 0)
                windowHash += prime;
        }
    }

    return -1; // Pattern not found
}

// Example Usage
string text = "ababcababc";
string pattern = "abc";
Console.WriteLine(RabinKarp(text, pattern)); // Output: 2
Enter fullscreen mode Exit fullscreen mode

What Happens in Each Iteration:

  1. Hash Initialization:
    • Compute patternHash for "abc": (25643).
    • Compute windowHash for "aba": (25642).
  2. First Window Comparison:
    • (windowHash \neq patternHash). Slide the window.
  3. Second Window Comparison:
    • Update windowHash to (25643). (windowHash = patternHash).
    • Verify actual characters: "abc" matches at index (2).
  4. Subsequent Comparisons:
    • Continue sliding the window until the end of the text.

2. Multiple Pattern Search

Problem: Find occurrences of multiple patterns in a text.

Code:

public static List<int> RabinKarpMulti(string text, List<string> patterns, int prime = 101)
{
    var result = new List<int>();
    foreach (var pattern in patterns)
    {
        int index = RabinKarp(text, pattern, prime);
        result.Add(index);
    }
    return result;
}

// Example Usage
string text = "ababcababc";
List<string> patterns = new List<string> { "abc", "ab" };
List<int> results = RabinKarpMulti(text, patterns);
Console.WriteLine(string.Join(", ", results)); // Output: 2, 0
Enter fullscreen mode Exit fullscreen mode

What Happens:

  1. Compute the hash for each pattern ("abc", "ab").
  2. Use the Rabin-Karp function for each pattern, returning their respective indices.

3. Rolling Hash Demonstration

Problem: Demonstrate the efficiency of the rolling hash.

Code:

public static void RollingHashDemo()
{
    string text = "abcd";
    int prime = 101;
    int windowHash = 0, h = 1;

    // Compute h = pow(256, m-1) % prime
    for (int i = 0; i < 2; i++) h = (h * 256) % prime;

    // Compute initial hash for "ab"
    windowHash = (256 * 'a' + 'b') % prime;
    Console.WriteLine($"Initial Hash: {windowHash}");

    // Slide the window to "bc"
    windowHash = (256 * (windowHash - 'a' * h) + 'c') % prime;
    if (windowHash < 0) windowHash += prime;
    Console.WriteLine($"New Hash: {windowHash}");
}

// Example Usage
RollingHashDemo();
// Output:
// Initial Hash: 84
// New Hash: 38
Enter fullscreen mode Exit fullscreen mode

What Happens in Each Iteration:

  1. Compute initial hash for "ab": (84).
  2. Update hash for "bc" using the rolling hash: (38).

Conclusion

The Rabin-Karp algorithm shines in scenarios where multiple pattern searches are required or when efficient hashing can minimize character comparisons. With the corrected implementation, hash calculations for patterns and text windows are now separated, ensuring clarity and robustness.

Mastering Rabin-Karp provides a strong foundation for understanding hashing in algorithm design and prepares you for tackling complex pattern-matching challenges!

💖 💪 🙅 🚩
craftedwithintent
Philip Thomas

Posted on November 24, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Demystifying Algorithms: Rabin-Karp
algorithms Demystifying Algorithms: Rabin-Karp

November 24, 2024