I'm studying algorithms and I have a question about the difference between isdigit()
and my_isdigit()
.
import time
def my_isdigit(str):
s = set(['1', '2', '3', '4', '5', '6', '7', '8', '9', '0'])
for x in str:
if x not in s:
return False
return True
test_str = '1234567890' * 10000000
start = time.time()
test_str.isdigit()
print(time.time() - start) # 0.21125102043151855
start = time.time()
my_isdigit(test_str)
print(time.time() - start) # 2.169161319732666
Why is isdigit()
much faster than mine?
I saw What is the fastest way to make sure a str is a int? but I cannot get the reason.
Is it the difference between C and Python? Or is my code bad?
CodePudding user response:
c language is compiler type language and also very close to machine level code/language, hence the run time for c is far more less when compared to python which is interpreted language.
CodePudding user response:
Firstly I want to note that your test suite is extremely limited, you are considering solely case when string is made of digits and not where there is mix of digits and non-digits and solely non-digits.
You should find timeit
useful, note that it can easily repeat given test many times. timeit
is part of standard library.
Regarding your function I would ameloriate it following way
def my_isdigit(s):
digits = {'1', '2', '3', '4', '5', '6', '7', '8', '9', '0'}
return digits.issuperset(s)
Observe that I do not use str
as it is built-in, you should not overshadow built-name variables unless really neccessary. I use set arithmetics to find if set of digits is superset of set of characters in s. I also used set literal rather than set
function to convert list into set.
CodePudding user response:
Builtin functions such as str.isdigit()
are implemented in C (when using CPython), and C is a lot faster than Python.
You can find the C code for str.isdigit
here; it calls these helpers which look up the character class in Python's built-in Unicode database.
Since str
s are Unicode, and Unicode aware, this leads into a possibly surprising result: characters that aren't Latin numerals are digit
too:
>>> '፫'.isdigit()
True
I tested a couple of different approaches (from this thread and otherwise): even faster than str.isdigit()
, and limited to Latin numerals only is to encode the Unicode string into ASCII bytes first, then run isdigit
on it (which is a different implementation that uses a simpler lookup table for the 256 possible ASCII bytes):
def bytes_isdigit(s):
return s.encode('ascii', 'ignore').isdigit()
# ...
name='my_isdigit' iters=1 time=2.788 iters_per_sec=0.36
name='builtin' iters=5 time=0.425 iters_per_sec=11.75
name='daweo' iters=1 time=1.011 iters_per_sec=0.99
name='khelwood' iters=1 time=4.837 iters_per_sec=0.21
name='bytes_isdigit' iters=5 time=0.309 iters_per_sec=16.18