import string
a , b , c = string.ascii_lowercase , string.ascii_uppercase , string.digits
l1 = [ (i , a ,b , c ,) for i in range(100,1000000) ] #all
l2 = [ (i,) for i in range(150,340192) ] # used
x = [
item for item in l1 if (item[0],) not in l2
]
print(len(x)) # it costs at least 50 seconds to run
**Im looking for improve my code or make it faster any idea? **
CodePudding user response:
l2
is a list (a very large one at that) of singleton tuples and you check for membership. Since you're checking for membership, making it a set will be much more efficient.
# construct a set from the range
s_l2 = set(range(150,340192))
# check for set membership
x = [item for item in l1 if item[0] not in s_l2]
Benchmark:
%timeit x = [item for item in l1 if (item[0],) not in l2]
# 6.55 s ± 326 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit y = [item for item in l1 if item[0] not in range(150,340192)]
# 4.22 s ± 56.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit s_l2 = set(range(150,340192)); z = [item for item in l1 if item[0] not in s_l2]
#141 ms ± 3.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
CodePudding user response:
The simple idea is to directly judge whether each element is in the range
without generating l2
:
>>> r = range(150, 340192)
>>> timeit(lambda: [item for item in [(i, a, b, c) for i in range(100,1000000)] if item[0] not in r], number=7)
1.5238849000015762
It can also be simplified directly by comparison:
>>> timeit(lambda: [item for item in [(i, a, b, c) for i in range(100,1000000)] if not 150 <= item[0] < 340192], number=7)
1.341257099993527
The simplest way is to build the target range in advance, and then generate the target list in once loop:
>>> from itertools import chain
>>> timeit(lambda: [(i, a, b, c) for i in chain(range(100, 150), range(340192, 1000000))], number=7)
0.5550074999919161