Home > Mobile >  Converting to set 'before' starting loop is faster in Python?
Converting to set 'before' starting loop is faster in Python?

Time:04-15

I have a large 1D list arr1 of length 100000 which may contain duplicates and another list arr2 which contains many of the elements in arr1 but cannot have duplicates. I wish to append all the elements of arr1 that are also in arr2 into a third list arr3:

file = []
with open('input.txt') as inputfile:
    for line in inputfile:
        file.append(line.strip().split(' '))

arr1 = file[1]     # 2nd line of input file
arr2 = file[2]     # 3rd line of input file

arr2 = set(arr2)
arr3 = [element for element in arr1 if element in arr2]

Works fine. But when I try:

arr3 = [element for element in arr1 if element in set(arr2)]

as apposed to the last two lines, I would expect the same exact result because they appear to be the same, but it takes forever to run this way. Are these somehow different?

Here is the input file.

CodePudding user response:

the if statement is running on every iteration - thus the conversion to set happens on every iteration.

You need to convert to set before the comparison loop is the solution.

  • Related