459. Repeated Substring Pattern

Problem Definition

LeetCode link: 459. Repeated Substring Pattern

Given a string s, check if it can be constructed by taking a substring of it and appending multiple copies of the substring together.

Example 1:

Input: s = "abab"
Output: true
Explanation: It is the substring "ab" twice.

Example 2:

Input: s = "aba"
Output: false

Example 3:

Input: s = "abcabcabcabc"
Output: true
Explanation: It is the substring "abc" four times or the substring "abcabc" twice.

Constraints:

1 <= s.length <= 10⁴
s consists of lowercase English letters.

Thought Process

The problem involves identifying whether a string has a repeating pattern. A brute-force approach involves checking all possible substrings to see if they form the original string by repetition. However, this results in a time complexity of O(n^2). More efficient solutions can be achieved using pattern-matching algorithms.

The two main approaches are:

String Manipulation with Concatenation
KMP (Knuth-Morris-Pratt) Algorithm

KMP Algorithm has been mentioned in the previous post, you can click into the below post to review the details of it.

Solution 1: String Manipulation with Concatenation

Explanation

If a string s can be constructed by repeating a substring, then appending s to itself (resulting in s + s) and removing the first and last characters will always include s as a substring in the middle. This approach leverages the properties of repeated substrings.

Statement to Prove

If a string s can be constructed by repeating a substring, then appending s to itself (resulting in s + s) and removing the first and last characters will always include s as a substring.

Definitions

Let s = p + p + ... + p (n times repetition of substring p).
- The length of s is len(s) = n * len(p), where len(p) is the length of the repeated substring.
Define s + s = s + s (concatenation of s with itself).
Define t = (s + s)[1:-1], which is s + s with its first and last characters removed.

Proof of Sufficiency (If `s` is composed of repeated substrings)

If s is composed of repeated substrings, then s will appear in t = (s + s)[1:-1].

Structure of t

t is formed by removing the first character of s and the last character of s from t = s[1:] + s[:-1]

Alignment of s in t

s = p + p + ... + p (n times repetition of p).
In t, the overlap between the end of the first s and the beginning of the second s ensures that s aligns as a substring.
The repeating pattern p naturally forms s in t.

Example Walkthrough

For s = "abcabc":

s + s = "abcabcabcabc"
t = (s + s)[1:-1] = "bcabcabcab"
s = "abcabc" appears in t.

Proof of Necessity (If `s` appears in `t`, it is composed of repeated substrings)

If s appears in t, then s must be composed of repeated substrings.

Overlapping Structure

For s to appear in t = (s + s)[1:-1], the end of the first s and the beginning of the second s in s + s must align perfectly.
This alignment is only possible if s is composed of repeated substrings p.

Counterexample for Non-Repeated s

If s is not composed of repeated substrings, there is no possible alignment for s to appear in t.
Example:
- s = "abcd"
- s + s = "abcdabcd"
- t = (s + s)[1:-1] = "bcdabc"
- s = "abcd" does not appear in `t

Conclusion

The construction of t = (s + s)[1:-1] ensures that s will only appear as a substring in t if s is composed of repeated substrings. This proves both the sufficiency and necessity of the approach.

Algorithm

Create a new string by appending s to itself.
Remove the first and last characters of this new string.
Check if the original string s appears in the modified string.

Python Implementation

class Solution:
    def repeatedSubstringPattern(self, s: str) -> bool:
        # Create a new string by appending s to itself
        concatenated = (s + s)[1:-1]
        # Check if the original string appears in the middle
        return s in concatenated

Complexity

Time Complexity: O(n) as the substring search is O(n).
Space Complexity: O(n), for the concatenated string.

Solution 2: KMP Algorithm

Explanation

The KMP algorithm is typically used for substring search but is highly effective for this problem. By calculating the "next" array (or prefix table), we can identify the longest prefix which is also a suffix. If the string is made of repeated substrings, the length of the string minus this prefix-suffix length gives the smallest repeating substring length.

Steps in the Algorithm

Prefix Table Construction:
- The prefix table keeps track of the length of the longest prefix that matches the suffix for substrings of the input string.
- For example, if the prefix table for a string text is [0, 0, 1, 2, 3], it means:
  - The longest prefix-suffix match for text[:5] has a length of 3.
Key Observation:
- If the entire string text has a prefix-suffix match of length last_prefix_length, then the remainder of the string, len(text) - last_prefix_length, must form the repeating unit.
- Check if len(text) is divisible by len(text) - last_prefix_length. If true, the string is composed of repeated substrings.

Proof on Why it Works

Let's use the string abababab to understand why this works:

First, let's understand what last_prefix_length means:
- For abababab, last_prefix_length = 6 (the prefix ababab matches the suffix ababab)
- Visually: [ababab]ab = ab[ababab]
Why is (len(text) - last_prefix_length) the repeating unit?
- In our example:
  - len(text) = 8 (abababab)
  - last_prefix_length = 6 (ababab)
  - len(text) - last_prefix_length = 2
  - This 2 represents ab, which is indeed our repeating unit!
Why does this work mathematically?
- If we have a string that can be divided into repeating units:
  - The entire string is made up of k copies of some substring
  - Let's call the substring length x
  - Then total length = k * x
  - In our case, x = len(text) - last_prefix_length
The crucial insight:
- If we have a proper prefix-suffix match of length last_prefix_length
- And if len(text) - last_prefix_length is truly the repeating unit
- Then len(text) MUST be divisible by this length
- Because the string must contain a whole number of repetitions

Here's a concrete proof:

Given: "abababab"
- last_prefix_length = 6 (matches "ababab")
- len(text) = 8
- potential unit length = 8 - 6 = 2 ("ab")
- 8 ÷ 2 = 4 (a whole number!)
- This means "ab" repeats exactly 4 times

Another example with abaabaaba:

- last_prefix_length = 6 (matches "abaaba")
- len(text) = 9
- potential unit length = 9 - 6 = 3
- 9 ÷ 3 = 3 ("aba" is a repeating unit here)

Python Implementation

class Solution:
    def repeatedSubstringPattern(self, text: str) -> bool:
        """
        Determines if a string can be constructed by repeating a substring.
        
        Args:
            text (str): The input string to check
            
        Returns:
            bool: True if the string is made up of a repeated substring, False otherwise
            
        Example:
            >>> solution = Solution()
            >>> solution.repeatedSubstringPattern("abab")
            True  # "ab" is the repeated substring
            >>> solution.repeatedSubstringPattern("aba")
            False  # No repeated substring pattern
        """
        if not text:
            return False
            
        prefix_table = [0] * len(text)
        self.build_prefix_table(prefix_table, text)
        
        last_prefix_length = prefix_table[-1]
        potential_substring_length = len(text) - last_prefix_length
        
        return (last_prefix_length != 0 and 
                len(text) % potential_substring_length == 0)
    
    def build_prefix_table(self, prefix_table: list[int], pattern: str) -> list[int]:
        """
        Builds the KMP prefix table (also known as the partial match table)
        for the given pattern string.
        
        Args:
            prefix_table: Pre-allocated list to store the prefix values
            pattern: The string to build the prefix table for
            
        Returns:
            list[int]: The prefix table where each index i contains the length of the longest proper prefix that is also a suffix for pattern[:i+1]
        """
        prefix_table[0] = 0
        prefix_length = 0
        
        for current_pos in range(1, len(pattern)):
            # Try to extend the previous prefix
            while prefix_length > 0 and pattern[current_pos] != pattern[prefix_length]:
                prefix_length = prefix_table[prefix_length - 1]
            
            # If characters match, extend the prefix
            if pattern[current_pos] == pattern[prefix_length]:
                prefix_length += 1
                
            prefix_table[current_pos] = prefix_length
            
        return prefix_table

Complexity

Time Complexity: O(n)

Space Complexity: O(n)

Key Takeaways

Two Effective Approaches:
- String Concatenation: A clever O(n) solution using string duplications
- KMP Algorithm: An efficient pattern matching solution using prefix tables
String Concatenation Method:
- If s is repeatable, s will appear in (s + s)[1:-1]
- Simpler to implement but requires more space
- Time and space complexity both O(n)
KMP Algorithm Method:
- Uses prefix table to find repeated patterns
- Can determine repeatability by checking if length - prefix_length divides the string length
- More complex to implement but provides deeper pattern insight
- O(n) time complexity with O(n) space
Algorithmic Insights:
- Pattern matching problems often have multiple valid approaches
- Understanding the mathematical proof behind each solution is crucial
- Edge cases handling is essential for both methods

Conclusion

The Repeated Substring Pattern problem demonstrates how string pattern matching can be solved through different algorithmic approaches. While the String Concatenation method offers an elegant and concise solution, the KMP Algorithm provides a more fundamental understanding of pattern matching mechanics. Both solutions achieve O(n) time complexity, making them efficient for large-scale applications. This problem serves as an excellent example of how deep understanding of string properties and pattern matching algorithms can lead to efficient solutions in string manipulation tasks. For coding interviews, knowing both approaches provides flexibility in choosing the most appropriate solution based on the specific requirements and constraints of the problem at hand.

459. Repeated Substring Pattern

Problem Definition

Thought Process

Solution 1: String Manipulation with Concatenation

Explanation

Statement to Prove

Definitions

Proof of Sufficiency (If `s` is composed of repeated substrings)

Proof of Necessity (If `s` appears in `t`, it is composed of repeated substrings)

Conclusion

Algorithm

Python Implementation

Complexity

Solution 2: KMP Algorithm

Explanation

Steps in the Algorithm

Proof on Why it Works

Python Implementation

Complexity

Key Takeaways

Conclusion

Discussion

Read next

Recommender System Evaluation (Part 3): Real-World Deployment - When Rubber Meets the Road

Recommender System Evaluation (Part 2): Beyond Accuracy - The User Experience Dimension

Fuel my Writing ⛽

Problem Definition

Thought Process

Solution 1: String Manipulation with Concatenation

Explanation

Statement to Prove

Definitions

Proof of Sufficiency (If s is composed of repeated substrings)

Proof of Necessity (If s appears in t, it is composed of repeated substrings)

Conclusion

Algorithm

Python Implementation

Complexity

Solution 2: KMP Algorithm

Explanation

Steps in the Algorithm

Proof on Why it Works

Python Implementation

Complexity

Key Takeaways

Conclusion

Discussion

Read next

Recommender System Evaluation (Part 3): Real-World Deployment - When Rubber Meets the Road

Recommender System Evaluation (Part 2): Beyond Accuracy - The User Experience Dimension

Fuel my Writing ⛽

Proof of Sufficiency (If `s` is composed of repeated substrings)

Proof of Necessity (If `s` appears in `t`, it is composed of repeated substrings)