Home > Enterprise >  Higher time complexity but still faster
Higher time complexity but still faster

Time:08-28

I have two functions to check whether two words word1 and word2 are anagrams or not (note: two words are said to be anagrams if you can rearrange the letters of one of the words to get the other).

def is_anagram(word1, wrod2):
    histogram = {}
    for char in word1:
        histogram[char] = histogram.get(char, 0)   1
    # Trying to exhaust all the letters in a histogram by second word
    for char in word2:
        histogram[char] = histogram.get(char, 0) - 1
    for vals in histogram.values():
        if vals != 0: return False
    return True

Clearly, in the above function 3 loops are running so it has an overall complexity of O(n)

Here is the second implementation:

def is_anagram2(word1, word2):
    sorted_word1 = ''.join(sorted(word1))
    sorted_word2 = ''.join(sorted(word2))
    return sorted_word1 == sorted_word2

The sorted function has a complexity of nlogn, so the complexity of this function should be O(nlogn).

But still if you measure the execution time of theses two functions (e.g. through timeit command in ipython), it is found that the is_anagram2 function is faster.

Please explain why...

CodePudding user response:

This is because the second function is leaving all the important work to the underlying native code of the Python interpreter (namely, the sorted function). The first function is instead doing everything in Python code, breaking the task into smaller operations.

Python (CPython at least, which is the official implementation) is implemented in C, and C code is an order of magnitude faster than Python code. This is because C is optimized and compiled to machine code that runs directly on your CPU. On the other hand, Python code runs on top of a virtual machine implemented in C (the Python interpreter): it has to be parsed, turned into Python bytecode, and then every single bytecode operation needs to be executed by the interpreter, which is much slower.

This is a common problem that will come up countless times when optimizing for performance in Python. Sometimes it is just better to leave native code do the job because the overhead of Python bytecode outweighs its advantages. This is also the reason why libraries such as NumPy are very popular: they implement most of the functionality in C, and only expose high level APIs through Python modules, so they can be very efficient for certain tasks if used correctly instead of plain Python code.

  • Related