459. Repeated Substring Pattern
Learn two efficient solutions to LeetCode 459 Repeated Substring Pattern problem: String Concatenation and KMP Algorithm. This guide explains both approaches with Python code, mathematical proofs, and examples. Perfect for coding interviews and string pattern matching problems.
data:image/s3,"s3://crabby-images/4cdba/4cdba921edb47dc3a610a104d2382919dbca74e6" alt="459. Repeated Substring Pattern"
Problem Definition
LeetCode link: 459. Repeated Substring Pattern
Given a string s
, check if it can be constructed by taking a substring of it and appending multiple copies of the substring together.
Example 1:
Input: s = "abab"
Output: true
Explanation: It is the substring "ab" twice.
Example 2:
Input: s = "aba"
Output: false
Example 3:
Input: s = "abcabcabcabc"
Output: true
Explanation: It is the substring "abc" four times or the substring "abcabc" twice.
Constraints:
1 <= s.length <= 104
s
consists of lowercase English letters.
Thought Process
The problem involves identifying whether a string has a repeating pattern. A brute-force approach involves checking all possible substrings to see if they form the original string by repetition. However, this results in a time complexity of O(n^2). More efficient solutions can be achieved using pattern-matching algorithms.
The two main approaches are:
- String Manipulation with Concatenation
- KMP (Knuth-Morris-Pratt) Algorithm
KMP Algorithm has been mentioned in the previous post, you can click into the below post to review the details of it.
data:image/s3,"s3://crabby-images/22d0f/22d0f94c3e06b55173d09eae9916f8bea1a99b53" alt=""
Solution 1: String Manipulation with Concatenation
Explanation
If a string s
can be constructed by repeating a substring, then appending s
to itself (resulting in s + s
) and removing the first and last characters will always include s
as a substring in the middle. This approach leverages the properties of repeated substrings.
Statement to Prove
If a string s
can be constructed by repeating a substring, then appending s
to itself (resulting in s + s
) and removing the first and last characters will always include s
as a substring.
Definitions
- Let
s = p + p + ... + p
(n times repetition of substringp
).- The length of
s
islen(s) = n * len(p)
, wherelen(p)
is the length of the repeated substring.
- The length of
- Define
s + s = s + s
(concatenation ofs
with itself). - Define
t = (s + s)[1:-1]
, which iss + s
with its first and last characters removed.
Proof of Sufficiency (If s
is composed of repeated substrings)
If s
is composed of repeated substrings, then s
will appear in t = (s + s)[1:-1]
.
- Structure of
t
t
is formed by removing the first character ofs
and the last character ofs
fromt = s[1:] + s[:-1]
- Alignment of
s
int
s = p + p + ... + p
(n times repetition ofp
).- In
t
, the overlap between the end of the firsts
and the beginning of the seconds
ensures thats
aligns as a substring. - The repeating pattern
p
naturally formss
int
.
- Example Walkthrough
For s = "abcabc"
:
s + s = "abcabcabcabc"
t = (s + s)[1:-1] = "bcabcabcab"
s = "abcabc"
appears int
.
Proof of Necessity (If s
appears in t
, it is composed of repeated substrings)
If s
appears in t
, then s
must be composed of repeated substrings.
- Overlapping Structure
- For
s
to appear int = (s + s)[1:-1]
, the end of the firsts
and the beginning of the seconds
ins + s
must align perfectly. - This alignment is only possible if
s
is composed of repeated substringsp
.
- Counterexample for Non-Repeated
s
- If
s
is not composed of repeated substrings, there is no possible alignment fors
to appear int
. - Example:
s = "abcd"
s + s = "abcdabcd"
t = (s + s)[1:-1] = "bcdabc"
s = "abcd"
does not appear in `t
Conclusion
The construction of t = (s + s)[1:-1]
ensures that s
will only appear as a substring in t
if s
is composed of repeated substrings. This proves both the sufficiency and necessity of the approach.
Algorithm
- Create a new string by appending
s
to itself. - Remove the first and last characters of this new string.
- Check if the original string
s
appears in the modified string.
Python Implementation
class Solution:
def repeatedSubstringPattern(self, s: str) -> bool:
# Create a new string by appending s to itself
concatenated = (s + s)[1:-1]
# Check if the original string appears in the middle
return s in concatenated
Complexity
- Time Complexity: O(n) as the substring search is O(n).
- Space Complexity: O(n), for the concatenated string.
Solution 2: KMP Algorithm
Explanation
The KMP algorithm is typically used for substring search but is highly effective for this problem. By calculating the "next" array (or prefix table), we can identify the longest prefix which is also a suffix. If the string is made of repeated substrings, the length of the string minus this prefix-suffix length gives the smallest repeating substring length.
Steps in the Algorithm
- Prefix Table Construction:
- The prefix table keeps track of the length of the longest prefix that matches the suffix for substrings of the input string.
- For example, if the prefix table for a string
text
is[0, 0, 1, 2, 3]
, it means:- The longest prefix-suffix match for
text[:5]
has a length of 3.
- The longest prefix-suffix match for
- Key Observation:
- If the entire string
text
has a prefix-suffix match of lengthlast_prefix_length
, then the remainder of the string,len(text) - last_prefix_length
, must form the repeating unit. - Check if
len(text)
is divisible bylen(text) - last_prefix_length
. If true, the string is composed of repeated substrings.
- If the entire string
Proof on Why it Works
Let's use the string abababab
to understand why this works:
- First, let's understand what
last_prefix_length
means:- For
abababab
,last_prefix_length
= 6 (the prefixababab
matches the suffixababab
) - Visually:
[ababab]ab = ab[ababab]
- For
- Why is
(len(text) - last_prefix_length)
the repeating unit?- In our example:
len(text)
= 8 (abababab
)last_prefix_length
= 6 (ababab
)len(text) - last_prefix_length
= 2- This 2 represents
ab
, which is indeed our repeating unit!
- In our example:
- Why does this work mathematically?
- If we have a string that can be divided into repeating units:
- The entire string is made up of
k
copies of some substring - Let's call the substring length
x
- Then total length =
k * x
- In our case,
x = len(text) - last_prefix_length
- The entire string is made up of
- If we have a string that can be divided into repeating units:
- The crucial insight:
- If we have a proper prefix-suffix match of length
last_prefix_length
- And if
len(text) - last_prefix_length
is truly the repeating unit - Then
len(text)
MUST be divisible by this length - Because the string must contain a whole number of repetitions
- If we have a proper prefix-suffix match of length
Here's a concrete proof:
Given: "abababab"
- last_prefix_length = 6 (matches "ababab")
- len(text) = 8
- potential unit length = 8 - 6 = 2 ("ab")
- 8 ÷ 2 = 4 (a whole number!)
- This means "ab" repeats exactly 4 times
Another example with abaabaaba
:
- last_prefix_length = 6 (matches "abaaba")
- len(text) = 9
- potential unit length = 9 - 6 = 3
- 9 ÷ 3 = 3 ("aba" is a repeating unit here)
Python Implementation
class Solution:
def repeatedSubstringPattern(self, text: str) -> bool:
"""
Determines if a string can be constructed by repeating a substring.
Args:
text (str): The input string to check
Returns:
bool: True if the string is made up of a repeated substring, False otherwise
Example:
>>> solution = Solution()
>>> solution.repeatedSubstringPattern("abab")
True # "ab" is the repeated substring
>>> solution.repeatedSubstringPattern("aba")
False # No repeated substring pattern
"""
if not text:
return False
prefix_table = [0] * len(text)
self.build_prefix_table(prefix_table, text)
last_prefix_length = prefix_table[-1]
potential_substring_length = len(text) - last_prefix_length
return (last_prefix_length != 0 and
len(text) % potential_substring_length == 0)
def build_prefix_table(self, prefix_table: list[int], pattern: str) -> list[int]:
"""
Builds the KMP prefix table (also known as the partial match table)
for the given pattern string.
Args:
prefix_table: Pre-allocated list to store the prefix values
pattern: The string to build the prefix table for
Returns:
list[int]: The prefix table where each index i contains the length of the longest proper prefix that is also a suffix for pattern[:i+1]
"""
prefix_table[0] = 0
prefix_length = 0
for current_pos in range(1, len(pattern)):
# Try to extend the previous prefix
while prefix_length > 0 and pattern[current_pos] != pattern[prefix_length]:
prefix_length = prefix_table[prefix_length - 1]
# If characters match, extend the prefix
if pattern[current_pos] == pattern[prefix_length]:
prefix_length += 1
prefix_table[current_pos] = prefix_length
return prefix_table
Complexity
Time Complexity: O(n)
Space Complexity: O(n)
Key Takeaways
- Two Effective Approaches:
- String Concatenation: A clever O(n) solution using string duplications
- KMP Algorithm: An efficient pattern matching solution using prefix tables
- String Concatenation Method:
- If s is repeatable, s will appear in
(s + s)[1:-1]
- Simpler to implement but requires more space
- Time and space complexity both O(n)
- If s is repeatable, s will appear in
- KMP Algorithm Method:
- Uses prefix table to find repeated patterns
- Can determine repeatability by checking if
length - prefix_length
divides the string length - More complex to implement but provides deeper pattern insight
- O(n) time complexity with O(n) space
- Algorithmic Insights:
- Pattern matching problems often have multiple valid approaches
- Understanding the mathematical proof behind each solution is crucial
- Edge cases handling is essential for both methods
Conclusion
The Repeated Substring Pattern problem demonstrates how string pattern matching can be solved through different algorithmic approaches. While the String Concatenation method offers an elegant and concise solution, the KMP Algorithm provides a more fundamental understanding of pattern matching mechanics. Both solutions achieve O(n) time complexity, making them efficient for large-scale applications. This problem serves as an excellent example of how deep understanding of string properties and pattern matching algorithms can lead to efficient solutions in string manipulation tasks. For coding interviews, knowing both approaches provides flexibility in choosing the most appropriate solution based on the specific requirements and constraints of the problem at hand.
Discussion