Home > Net >  How to efficiently get common items from two lists that may have duplicates?
How to efficiently get common items from two lists that may have duplicates?

Time:03-05

my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']

The common items are:

c = ['a', 'b', 'a']

The code:

for e in my_list:
   if e in my_list_2:
      c.append(e)
      ...

If the my_list is long, this would be very inefficient. If I convert both lists into two sets, then use set's intersection() function to get the common items, I will lose the duplicates in my_list.

How to deal with this efficiently?

CodePudding user response:

dict is already a hashmap, so lookups are practically as efficient as a set, so you may not need to do any extra work collecting the values - if it wasn't, you could pack the values into a set to check before checking the dict

However, a large improvement may be to make a generator for the values, rather than creating a new intermediate list, to iterate over where you actually want the values

def foo(src_dict, check_list):
    for value in check_list:
        if value in my_dict:
            yield value

With the edit, you may find you're better off packing all the inputs into a set

def foo(src_list, check_list):
    hashmap = set(src_list)
    for value in check_list:
        if value in hashmap:
            yield value

If you know a lot about the inputs, you can do better, but that's an unusual case (for example if the lists are ordered you could bisect, or if you have a huge verifying list or very very few values to check against it you may find some efficiency in the ordering and if you make a set)

CodePudding user response:

I am not sure about time efficiency, but, personally speaking, list comprehension would always be more of interest to me:

[x for x in my_list if x in my_list_2]

Output

['a', 'b', 'a']
  • Related