LeetCode Link: 242. Valid Anagram

Problem Definition

Given two strings s and t, return true if t is an anagram of s, and false otherwise.

Example 1:

Input: s = "anagram", t = "nagaram"

Output: true

Example 2:

Input: s = "rat", t = "car"

Output: false

Constraints:

  • 1 <= s.length, t.length <= 5 * 104
  • s and t consist of lowercase English letters.

Thought Process

Let's dive straight into an efficient solution for this problem. While a brute force approach using nested loops would give us O(n²) time complexity, we can do much better.

When working with strings containing only lowercase letters, we can efficiently determine if two strings are anagrams using an array as a frequency counter, which acts as a lightweight hash table. Why an array? Because the ASCII values of lowercase letters (a to z) form a continuous sequence of 26 values, making an array with 26 slots perfect for this task.

If you're new to the relationship between arrays, sets, maps, and hash tables, I've written about this topic in depth in my post:

Hash Table Theoretical Foundations
Discover the power of hash tables - a fundamental data structure that optimizes data lookup operations. Learn how they work, when to use them, and explore Python’s implementation through sets and dictionaries. Master efficient solutions for rapid element verification with O(1) time complexity.

Example: Strings s = "aee" and t = "eae"

  1. The Concept
    We’ll use an array, record, with 26 positions (one for each lowercase letter), initialized to 0. Each position corresponds to a letter of the alphabet:
    • Index 0 represents a, index 1 represents b, ..., index 25 represents z.
  2. Mapping Characters to Array Indices
    To find the index for a character, subtract the ASCII value of 'a' from the ASCII value of the character:
    • For example, 'a' - 'a' = 0, 'e' - 'a' = 4.
  3. Algorithm Steps:
    • Traverse string s and increment the counter for each character.
    • Traverse string t and decrement the counter for each character.
    • After processing both strings, check if all positions in record are 0.
      • If yes, the strings are anagrams.
      • If not, they’re not anagrams.

Solution 1 - 26-Slot

class Solution:
    def isAnagram(self, s: str, t: str) -> bool:
        record = [0] * 26
        for i in s:
            # No need to remember the ASCII value of 'a';
            # just calculate a relative index value.
            record[ord(i) - ord("a")] += 1
        for i in t:
            # Subtract the count for each character in string t.
            record[ord(i) - ord("a")] -= 1
        for i in range(26):
            if record[i] != 0:
                # If any element in the record array is not zero,
                # it means one of the strings has extra or missing characters.
                return False
        return True

The ord function in Python returns the Unicode code point (integer value) of a given character. It is short for "ordinal," which refers to the numerical representation of characters in the Unicode standard.

How It Works

  • Each character has a corresponding integer value in Unicode. For example:
    • 'a' → 97
    • 'b' → 98
    • 'A' → 65
    • ' ' (space) → 32

This function checks if two strings s and t are anagrams by comparing their character frequencies:

Use Case in the Code:

In the anagram-checking solutions, ord is used to calculate an array index for each character by subtracting the Unicode value of 'a'. For example:

index = ord(char) - ord('a')

This maps:

  • 'a' to 0
  • 'b' to 1
  • 'z' to 25

This is efficient when working with lowercase English letters, as it provides a fixed range of indices (0–25) that can directly correspond to array positions.

Code Explanation

  1. Frequency Count Using record:
    • record is an array of size 26 (for each lowercase letter).
    • For every character in s, increment the corresponding index in record.
    • For every character in t, decrement the corresponding index.
  2. Verify Anagram Property:
    • After processing both strings, if all values in record are 0, the strings are anagrams (they have the same characters with the same frequencies).
    • If any value in record is non-zero, the strings are not anagrams.

Complexity

  • Time Complexity:
    Traversing both strings takes O(n), where n is the length of the strings.
  • Space Complexity:
    The record array always has 26 positions, making space complexity O(1).

Solution 2 - collections.defaultdict()

class Solution:
    def isAnagram(self, s: str, t: str) -> bool:
        from collections import defaultdict  # Import defaultdict from collections module
        
        s_dict = defaultdict(int)  # Initialize a dictionary for string `s` with default value 0
        t_dict = defaultdict(int)  # Initialize a dictionary for string `t` with default value 0
        
        for x in s:
            # Count the frequency of each character in string `s`
            s_dict[x] += 1
        
        for x in t:
            # Count the frequency of each character in string `t`
            t_dict[x] += 1
        
        # Compare the two dictionaries; if they are identical, the strings are anagrams
        return s_dict == t_dict

How defaultdict Works

  • defaultdict is a subclass of Python’s built-in dict that provides a default value for a key that does not yet exist in the dictionary.
  • When you try to access or modify a key that isn’t present in the dictionary:
    • defaultdict automatically creates the key with the default value.
    • No KeyError is raised.

If you use a regular dict, attempting to increment a key that doesn’t exist would raise a KeyError. You’d have to explicitly check whether the key is in the dictionary before modifying it:

Code Explanation

  1. Using defaultdict for Frequency Counting:
    • A defaultdict is used to store the frequency of each character in both strings.
    • It automatically initializes missing keys with a default value of 0.
  1. Building Frequency Dictionaries:
    • Loop through s to count the frequency of each character in s_dict.
    • Loop through t to count the frequency of each character in t_dict.
  1. Compare Frequency Dictionaries:
    • If s_dict and t_dict are identical, it means s and t have the same characters in the same frequencies, so they are anagrams.
    • If the dictionaries differ, they are not anagrams.

Complexity

Time Complexity: O(n)

  1. First loop through string s: O(n) where n is the length of s
  2. Second loop through string t: O(m) where m is the length of t
  3. Dictionary comparison operation (s_dict == t_dict): O(k) where k is the number of unique characters
    • In Python, dictionary comparison checks each key-value pair
    • Since we're dealing with lowercase letters only, k is at most 26

Therefore, the total time complexity is O(n + m + k) which simplifies to O(n).

Space Complexity: O(1)

  1. s_dict stores at most 26 key-value pairs (one for each lowercase letter)
  2. t_dict also stores at most 26 key-value pairs
  3. Even though we're using dictionaries, their size is bounded by the alphabet size
  4. Therefore, the space requirement is constant regardless of input size

Advantages of defaultdict

  1. Simpler Code:
    • Using defaultdict, you skip the if-else check entirely.
    • Missing keys are initialized with the default value (in this case, 0) automatically.
  1. Readability:
    • The code is more concise and easier to understand, as the frequency counting logic is directly implemented in one line:s_dict[x] += 1
  1. Efficiency:
    • Avoiding the if-else check can make the code slightly faster in large datasets.

Key Takeaway

Using defaultdict is ideal for counting problems like this one, where all keys need to have a default value (0) before any operations are performed. While a regular dict can also work, it requires additional code to handle missing keys explicitly.

Solution 3 - collections.Counter

class Solution(object):
    def isAnagram(self, s: str, t: str) -> bool:
        from collections import Counter  # Import Counter from the collections module
        
        a_count = Counter(s)  # Count the frequency of each character in string `s`
        b_count = Counter(t)  # Count the frequency of each character in string `t`
        
        # Compare the two frequency dictionaries;
        # if they are identical, the strings are anagrams
        return a_count == b_count

This solution uses Python's Counter from the collections module to check if two strings s and t are anagrams by comparing their character frequencies.

Code Explanation

  1. Using Counter:

Counter automatically counts the occurrences of each character in the input string and stores the counts in a dictionary-like structure.

For example:

Counter("aabbcc")  # Output: {'a': 2, 'b': 2, 'c': 2}
  1. Count the Frequencies:
    • a_count stores the character counts for string s.
    • b_count stores the character counts for string t.
  1. Compare the Two Counters:
    • If a_count and b_count are identical, it means s and t have the same characters in the same frequencies, so they are anagrams.
    • If the counters differ, the strings are not anagrams.

Complexity

Time Complexity: O(n)

  1. Counter(s) operates in O(n) time where n is the length of string s
  2. Counter(t) operates in O(m) time where m is the length of string t
  3. Dictionary comparison (a_count == b_count) takes O(k) where k is the number of unique characters
    • For lowercase letters, k ≤ 26

Total time complexity is O(n + m + k) which simplifies to O(n).

Space Complexity: O(1)

  1. a_count stores at most 26 entries (one per lowercase letter)
  2. b_count also stores at most 26 entries
  3. Since we're limited to lowercase letters, the space used is bounded
  4. Thus space usage is constant regardless of input size