347. Top K Frequent Elements
Discover three efficient approaches to solve LeetCode 347 Top K Frequent Elements problem. Learn how to leverage min-heaps, bucket sort, and traditional sorting methods, with detailed complexity analysis and Python implementations for optimal performance.
LeetCode Link: 347. Top K Frequent Elements
Problem Definition
Given an integer array nums
and an integer k
, return the k
most frequent elements. You may return the answer in any order.
Example 1:
Input: nums = [1,1,1,2,2,3], k = 2
Output: [1,2]
Example 2:
Input: nums = [1], k = 1
Output: [1]
Constraints:
1 <= nums.length <= 105
-104 <= nums[i] <= 104
k
is in the range[1, the number of unique elements in the array]
.- It is guaranteed that the answer is unique.
Thought Process
Let's break this problem into three core components:
- Counting element frequencies
- Sorting these frequencies
- Finding the k most frequent elements
The Common Foundation: Counting Frequencies
Every solution starts with a crucial first step: counting how often each number appears. Think of it like taking attendance in a classroom – we're simply keeping track of how many times we see each number. We'll use a hash map for this, which gives us quick lookups and updates.
The Min-Heap Solution: A Counter-Intuitive Solution
Here's where things get interesting. While you might think we'd want a max-heap to find the largest frequencies, we actually want the opposite! By using a min-heap of size k, we create a clever filtering system.
Think of this like maintaining a small VIP room at a club that only holds k people. Here's how it works:
- We start with an empty VIP room (our min-heap) that can hold k people
- When someone new arrives (a number with its frequency):
- If the room isn't full (heap size < k), they get in automatically
- If the room is full, we compare their "popularity" (frequency) with the least popular person in the room
- If they're more popular, the least popular person leaves and they get in
- If they're less popular, they don't get in
By the end, our VIP room naturally contains the k most frequent elements, because we always kept kicking out the least frequent ones when someone more frequent came along.
The Bucket Sort Approach: Using Position as Power
Bucket sort offers an elegant solution that might seem magical at first. Imagine having a series of buckets numbered by frequency. Each number goes into the bucket matching how often it appears. Want the most frequent elements? Just start from the last bucket and work backwards.
Imagine you have a bookshelf where each shelf number represents how many times a book has been borrowed. Here's the process:
- First, we count how many times each book (number) has been borrowed
- Then, we create shelves numbered 1, 2, 3, up to the maximum times any book was borrowed
- We place each book on the shelf matching how many times it was borrowed
- To find the k most popular books, we simply start from the highest shelf and work our way down until we have k books
For example, if we have numbers [1,1,1,2,2,3]:
- Number 1 appears 3 times → goes on shelf 3
- Number 2 appears 2 times → goes on shelf 2
- Number 3 appears 1 time → goes on shelf 1
To get the top 2 most frequent elements (k=2), we just grab number 1 from shelf 3 and number 2 from shelf 2.
The beauty of bucket sort is that we don't need to compare elements directly - their frequency automatically determines their position, making it extremely efficient for this type of problem.
The Traditional Sorting Approach: Simple but Costly
While not the most efficient, sorting by frequency is the most straightforward approach. It's like arranging students by their test scores – simple to understand but takes more time as the class size grows.
Making the Right Choice
Each approach has its sweet spot. For smaller arrays, the difference might be negligible. But as your data grows, the efficiency of bucket sort or the elegance of the min-heap solution becomes more apparent. Consider your specific needs – are you dealing with a limited range of frequencies? Bucket sort might be your best friend. Need a solution that's memory efficient? The min-heap approach might be your answer.
Remember, in real-world applications, readability and maintainability often trump minor performance gains. Choose the solution that best balances efficiency with clarity for your specific use case.
Solutions
Using a Min-Heap (Priority Queue)
In this solution:
- Count the frequency of each number using a dictionary.
- Use a min-heap of size
k
to maintain the topk
most frequent elements. - Extract the top
k
elements from the heap.
import heapq
from typing import List
class Solution:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
# Step 1: Count frequencies
frequency_map = {}
for num in nums:
frequency_map[num] = frequency_map.get(num, 0) + 1
# Step 2: Use a min-heap to keep the top k elements
min_heap = []
for num, freq in frequency_map.items():
heapq.heappush(min_heap, (freq, num))
if len(min_heap) > k: # Remove the smallest frequency if heap size exceeds k
heapq.heappop(min_heap)
# Step 3: Extract the elements from the heap
return [num for freq, num in min_heap]
Complexity Analysis
- Time Complexity: O(nlogk), where n is the size of the array.
- Building the frequency map takes O(n).
- Each
heappush
andheappop
operation takes O(logk), and we perform these operations for each unique element.
- Space Complexity: O(k+u), where u is the number of unique elements.
- The frequency map uses O(u) space, and the heap uses O(k) space.
Bucket Sort
This approach uses a bucket sort technique, which is ideal when the range of frequencies is limited:
- Count the frequency of each element using a dictionary.
- Use a list of buckets, where the index represents the frequency, and the value is a list of numbers with that frequency.
- Iterate through the buckets in reverse order to collect the top
k
elements.
from typing import List
from collections import defaultdict
class Solution:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
# Step 1: Count frequencies
frequency_map = defaultdict(int)
for num in nums:
frequency_map[num] += 1
# Step 2: Create buckets where index represents frequency
max_freq = max(frequency_map.values())
buckets = [[] for _ in range(max_freq + 1)]
for num, freq in frequency_map.items():
buckets[freq].append(num)
# Step 3: Collect top k elements from buckets
result = []
for freq in range(len(buckets) - 1, 0, -1):
for num in buckets[freq]:
result.append(num)
if len(result) == k:
return result
Advantages of This Approach
- Avoids Sorting: No explicit sorting of elements by frequency is required.
- Efficient: Time complexity is O(n), where n is the size of the input array.
- Simple to Implement: Uses basic list operations and is intuitive to follow.
Complexity Analysis:
- Time Complexity: O(n).
- Counting frequencies takes O(n).
- Placing elements in buckets and extracting the top
k
elements are O(n).
- Space Complexity: O(n).
- The buckets and frequency map together use O(n) space.
Using Sorting (Not Optimal for Large n
)
For completeness, here’s a solution using sorting:
- Count frequencies using a dictionary.
- Sort the elements by frequency in descending order.
- Return the top
k
elements.
class Solution:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
# Step 1: Count frequencies
frequency_map = {}
for num in nums:
frequency_map[num] = frequency_map.get(num, 0) + 1
# Step 2: Sort by frequency in descending order
sorted_elements = sorted(frequency_map.keys(), key=lambda x: frequency_map[x], reverse=True)
# Step 3: Return the top k elements
return sorted_elements[:k]
Complexity Analysis:
- Time Complexity: O(nlogn), dominated by the sorting step.
- Space Complexity: O(u), where u is the number of unique elements.
Key Takeaways
- Use Min-Heap for Efficiency: The heap solution ensures that we only maintain the top
k
elements, achieving O(nlogk) complexity. - Bucket Sort is Elegant for Range-Limited Problems: If the range of possible frequencies is limited, bucket sort provides a clean and efficient solution with O(n) complexity.
- Sorting is a Simpler Alternative but Less Efficient: Sorting-based solutions are intuitive but not ideal for large inputs due to O(nlogn) complexity.
Discussion