for the problem on leetcode 'Top K Frequent Elements' https://leetcode.com/problems/top-k-frequent-elements/submissions/
there is a solution that completes the task in just 88 ms, mine completes the tasks in 124 ms, I see it as a large difference.
I tried to understand why buy docs don't provide the way the function I use is implemented which is most_common(), if I want to dig a lot in details like that, such that I can write algorithms that run so fast in the future what should I read(specific books? or any other resource?).
my code (124 ms)
def topKFrequent(self, nums, k):
if k ==len(nums):
return nums
c=Counter(nums)
return [ t[0] for t in c.most_common(k) ]
other (88 ms) (better in time)
def topKFrequent(self, nums, k):
if k == len(nums):
return nums
count = Counter(nums)
return heapq.nlargest(k, count.keys(), key=count.get)
both are nearly taking same amount of memory, so no difference here.
CodePudding user response:
The implementation of most_common
also uses heapq.nlargest
, but it calls it with count.items()
instead of count.keys()
. This will make it a tiny bit slower, and also requires the overhead of creating a new list, in order to extract the [0]
value from each element in the list returned by most_common()
.
The heapq.nlargest
version just avoids this extra overhead, and passes count.keys()
as second argument, and therefore it doesn't need to iterate that result again to extract pieces into a new list.
CodePudding user response:
@trincot seems to have answered the question but if anyone is looking for a faster way to do this then use Numpy, provided nums
can be stored as a np.array:
def topKFrequent_numpy(nums, k):
unique, counts = np.unique(nums, return_counts=True)
return unique[np.argsort(-counts)[:k]]
One speed test
nums_array = np.random.randint(1000, size=1000000)
nums_list = list(nums_array)
%timeit topKFrequent_Counter(nums_list, 500)
# 116 ms ± 4.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit topKFrequent_heapq(nums_list, 500)
# 117 ms ± 3.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit topKFrequent_numpy(nums_array, 500)
# 39.2 ms ± 185 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
(Speeds may be dramatically different for other input values)