Why is the built-in string.isdigit() implementation faster than my own implementation?-CodePudding

I'm studying algorithms and I have a question about the difference between isdigit() and my_isdigit().

import time


def my_isdigit(str):
    s = set(['1', '2', '3', '4', '5', '6', '7', '8', '9', '0'])
    for x in str:
        if x not in s:
            return False
    return True


test_str = '1234567890' * 10000000
start = time.time()
test_str.isdigit()
print(time.time() - start)  # 0.21125102043151855


start = time.time()
my_isdigit(test_str)
print(time.time() - start)  # 2.169161319732666

Why is isdigit() much faster than mine?

I saw What is the fastest way to make sure a str is a int? but I cannot get the reason.

Is it the difference between C and Python? Or is my code bad?

CodePudding user response：

c language is compiler type language and also very close to machine level code/language, hence the run time for c is far more less when compared to python which is interpreted language.

CodePudding user response：

Firstly I want to note that your test suite is extremely limited, you are considering solely case when string is made of digits and not where there is mix of digits and non-digits and solely non-digits.

You should find timeit useful, note that it can easily repeat given test many times. timeit is part of standard library.

Regarding your function I would ameloriate it following way

def my_isdigit(s):
    digits = {'1', '2', '3', '4', '5', '6', '7', '8', '9', '0'}
    return digits.issuperset(s)

Observe that I do not use str as it is built-in, you should not overshadow built-name variables unless really neccessary. I use set arithmetics to find if set of digits is superset of set of characters in s. I also used set literal rather than set function to convert list into set.

CodePudding user response：

Builtin functions such as str.isdigit() are implemented in C (when using CPython), and C is a lot faster than Python.

You can find the C code for str.isdigit here; it calls these helpers which look up the character class in Python's built-in Unicode database.

Since strs are Unicode, and Unicode aware, this leads into a possibly surprising result: characters that aren't Latin numerals are digit too:

>>> '፫'.isdigit()
True

I tested a couple of different approaches (from this thread and otherwise): even faster than str.isdigit(), and limited to Latin numerals only is to encode the Unicode string into ASCII bytes first, then run isdigit on it (which is a different implementation that uses a simpler lookup table for the 256 possible ASCII bytes):

def bytes_isdigit(s):
    return s.encode('ascii', 'ignore').isdigit()

# ...
name='my_isdigit' iters=1 time=2.788 iters_per_sec=0.36
name='builtin' iters=5 time=0.425 iters_per_sec=11.75
name='daweo' iters=1 time=1.011 iters_per_sec=0.99
name='khelwood' iters=1 time=4.837 iters_per_sec=0.21
name='bytes_isdigit' iters=5 time=0.309 iters_per_sec=16.18