why these algorithms differ in their execution time?-CodePudding

for the problem on leetcode 'Top K Frequent Elements' https://leetcode.com/problems/top-k-frequent-elements/submissions/

there is a solution that completes the task in just 88 ms, mine completes the tasks in 124 ms, I see it as a large difference.

I tried to understand why buy docs don't provide the way the function I use is implemented which is most_common(), if I want to dig a lot in details like that, such that I can write algorithms that run so fast in the future what should I read(specific books? or any other resource?).

my code (124 ms)

def topKFrequent(self, nums, k):
        if k ==len(nums):
            return nums
        c=Counter(nums)
        return [ t[0] for t in c.most_common(k) ]

other (88 ms) (better in time)

def topKFrequent(self, nums, k):
        if k == len(nums):
            return nums
        count = Counter(nums)   
        return heapq.nlargest(k, count.keys(), key=count.get)

both are nearly taking same amount of memory, so no difference here.

CodePudding user response：

The implementation of most_common also uses heapq.nlargest, but it calls it with count.items() instead of count.keys(). This will make it a tiny bit slower, and also requires the overhead of creating a new list, in order to extract the [0] value from each element in the list returned by most_common().

The heapq.nlargest version just avoids this extra overhead, and passes count.keys() as second argument, and therefore it doesn't need to iterate that result again to extract pieces into a new list.

CodePudding user response：

@trincot seems to have answered the question but if anyone is looking for a faster way to do this then use Numpy, provided nums can be stored as a np.array:

def topKFrequent_numpy(nums, k):
    unique, counts = np.unique(nums, return_counts=True)
    return unique[np.argsort(-counts)[:k]]

One speed test

nums_array = np.random.randint(1000, size=1000000)
nums_list = list(nums_array)

%timeit topKFrequent_Counter(nums_list, 500)
# 116 ms ± 4.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit topKFrequent_heapq(nums_list, 500)
# 117 ms ± 3.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit topKFrequent_numpy(nums_array, 500)
# 39.2 ms ± 185 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

(Speeds may be dramatically different for other input values)