my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
The common items are:
c = ['a', 'b', 'a']
The code:
for e in my_list:
if e in my_list_2:
c.append(e)
...
If the my_list is long, this would be very inefficient. If I convert both lists into two sets, then use set's intersection() function to get the common items, I will lose the duplicates in my_list.
How to deal with this efficiently?
CodePudding user response:
dict
is already a hashmap, so lookups are practically as efficient as a set
, so you may not need to do any extra work collecting the values - if it wasn't, you could pack the values into a set
to check before checking the dict
However, a large improvement may be to make a generator for the values, rather than creating a new intermediate list, to iterate over where you actually want the values
def foo(src_dict, check_list):
for value in check_list:
if value in my_dict:
yield value
With the edit, you may find you're better off packing all the inputs into a set
def foo(src_list, check_list):
hashmap = set(src_list)
for value in check_list:
if value in hashmap:
yield value
If you know a lot about the inputs, you can do better, but that's an unusual case (for example if the lists are ordered you could bisect, or if you have a huge verifying list or very very few values to check against it you may find some efficiency in the ordering and if you make a set
)
CodePudding user response:
I am not sure about time efficiency, but, personally speaking, list comprehension would always be more of interest to me:
[x for x in my_list if x in my_list_2]
Output
['a', 'b', 'a']