KMP search

Jyothi
5 min readAug 22, 2024

--

An efficient way to solve this problem is by using the Knuth-Morris-Pratt (KMP) algorithm for pattern matching. The KMP algorithm has a time complexity of 𝑂(𝑛+𝑚) where
n is the length of the text and m is the length of the pattern.

The KMP algorithm is designed to find all occurrences of a pattern in a text efficiently by LPS array. LPS array is the array of lengths of longest prefix which is also suffix. This allows the algorithm to avoid redundant comparisons.

Steps Involved in KMP:
Build the LPS array, which helps in determining the next positions to compare when a mismatch occurs.

Search for the Pattern: Use the LPS array to move through the text, comparing the pattern and text while avoiding unnecessary re-comparisons.

The LPS (Longest Prefix Suffix) array helps in optimizing the pattern searching process by allowing the algorithm to skip unnecessary comparisons when a mismatch occurs between the pattern and the text.

How LPS Works:

  1. The LPS array for a given pattern is an array where each element at index i represents the length of the longest proper prefix of the substring pattern[0…i] that is also a suffix of this substring.

Proper Prefix: A prefix that is not equal to the entire string. For example, for the string “abab”, the proper prefixes are “”, “a”, “ab”, and “aba”.

Suffix: A suffix is the ending part of a string. For example, for the string “abab”, the suffixes are “abab”, “bab”, “ab”, and “b”.

  1. Example of LPS Array:
    Consider the pattern P = “ababaca”:

Index i Pattern Prefix (P[0…i]) Longest Prefix Suffix Length LPS Array Value
0 a 0 0
1 ab 0 0
2 aba 1 1
3 abab 2 2
4 ababa 3 3
5 ababac 0 0
6 ababaca 1 1
The LPS array for this pattern is: [0, 0, 1, 2, 3, 0, 1].

  1. How LPS Helps in Pattern Matching:
    When a mismatch occurs during the pattern matching process, the LPS array allows the KMP algorithm to skip redundant comparisons by indicating the next position in the pattern that should be compared with the current character in the text.

Step-by-Step Usage of LPS:

Matching Process:

When you are comparing characters of the text and the pattern, and a mismatch occurs, the LPS array tells you how many characters you can skip.
Skipping Unnecessary Comparisons:

If a mismatch happens at position j in the pattern, you do not need to start matching from the beginning of the pattern. Instead, the LPS array suggests the next position to continue comparing by skipping over the already matched portion that corresponds to a proper prefix which is also a suffix.

For example, if j = 5 and LPS[j-1] = 3, it means that the first three characters of the pattern match the current text position. You can then skip directly to pattern[3] and continue the comparison from there.

Why This Is Efficient:

Avoids Redundant Comparisons: Without the LPS array, you’d have to start from the beginning of the pattern after every mismatch, leading to inefficient comparisons.
Linear Time Complexity: By using the LPS array, the KMP algorithm ensures that each character in the text and pattern is compared only once, leading to an O(n + m) time complexity.
Visual Example:
Suppose you’re searching for the pattern “ababaca” in the text “bacbabababaca”.

Start comparing the pattern with the text from the beginning. When you hit a mismatch after partially matching the pattern, the LPS array helps you avoid re-checking characters that have already been matched.

For instance, after matching “abab” in the text and hitting a mismatch, instead of starting over, the LPS array tells you that you can skip directly to “ab” in the pattern and continue matching from there.

Let’s break down how the KMP (Knuth-Morris-Pratt) algorithm works with the pattern “aaa” and the text “aaaab”. This will illustrate how the algorithm can skip redundant comparisons using the LPS (Longest Prefix Suffix) array.

Step 1: Compute the LPS Array for the Pattern “aaa”
The LPS array is built to capture the longest proper prefix of the pattern that is also a suffix for each position in the pattern.

LPS Array Calculation:
Index i Pattern Prefix (P[0…i]) LPS Value Explanation
0 a 0 No proper prefix that is also a suffix.
1 aa 1 The substring “a” is both a proper prefix and a suffix.
2 aaa 2 The substring “aa” is both a proper prefix and a suffix.
So, the LPS array for “aaa” is [0, 1, 2].

Step 2: Use the KMP Algorithm to Search for the Pattern in the Text “aaaab”
Text: “aaaab”
Pattern: “aaa”
LPS Array: [0, 1, 2]
Iteration Details:
Iteration 1:

i = 0, j = 0: Compare text[0] = ‘a’ with pattern[0] = ‘a’.
They match, so increment both i and j.
i = 1, j = 1
Iteration 2:

i = 1, j = 1: Compare text[1] = ‘a’ with pattern[1] = ‘a’.
They match, so increment both i and j.
i = 2, j = 2
Iteration 3:

i = 2, j = 2: Compare text[2] = ‘a’ with pattern[2] = ‘a’.

They match, so increment both i and j.

i = 3, j = 3

Now j == 3 (which is the length of the pattern), meaning we’ve found a match. The pattern “aaa” is found starting at index i — j = 3–3 = 0.

Update j = LPS[j-1] = LPS[2] = 2 to continue searching for further matches.

Iteration 4:

i = 3, j = 2: Compare text[3] = ‘a’ with pattern[2] = ‘a’.

They match, so increment both i and j.

i = 4, j = 3

Again j == 3, meaning we’ve found another match. The pattern “aaa” is found starting at index i — j = 4–3 = 1.

Update j = LPS[j-1] = LPS[2] = 2 to continue searching.

Iteration 5:

i = 4, j = 2: Compare text[4] = ‘b’ with pattern[2] = ‘a’.
They don’t match.
Since j != 0, use LPS[j-1] to update j: j = LPS[1] = 1. This is where the skipping happens!
Do not increment i.
Explanation of the Skip:

Instead of restarting the pattern search from j = 0, we utilize the LPS array value to skip unnecessary comparisons and continue searching from j = 1.
Iteration 6:

i = 4, j = 1: Compare text[4] = ‘b’ with pattern[1] = ‘a’.
They don’t match.
Since j != 0, use LPS[j-1] to update j: j = LPS[0] = 0.
Increment i.

My solution:

Introducing Jyo Services — Your Go-To Partner for Tech Solutions

Are you looking to boost your career or bring your tech ideas to life? Jyo Services offers a range of specialized solutions tailored just for you:

  • One-on-One Interview Preparation: Tailored coaching for freshers and professionals with up to 7 years of software development experience.
  • Custom Software Development: Expertise in embedded technologies, AI/ML, Gen AI, Cyber Security, Linux-based applications, and network protocol developments.
  • Patent Assistance: Comprehensive support for drafting, filing and maintaining patents in India.
  • College Projects: Professional guidance for Engineering students on academic projects.

Unlock your potential with Jyo Services! Contact us today to get started.

DM me at: www.linkedin.com/in/jyothivs or jyos.v.s@gmail.com

--

--