Home > Software design >  Python - Compare lists of dictionaries and return not matches of one of the keys
Python - Compare lists of dictionaries and return not matches of one of the keys

Time:07-21

I want to compare 2 lists (with dictionaries inside) and get values from the dictionaries that don't match.

So I have something like this:

list1 = [{'text': 'dog', 'number': 10},{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}] 

list2 = [{'text': 'dog'}] 

And I want to get the texts that are not on both lists. Texts are the only criteria. It's not relevant if the numbers are the same or not.

The desired result would look like this:

list_notmatch = [{'text': 'cat'},{'text': 'horse'}]

If it's easier or faster, this would be OK too:

list_notmatch = [{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}]

I've seen a similar question (Compare two lists of dictionaries in Python. Return non match) but the output it's not exactly what I need and I don't know if it's the best solution for what I need.

The real lists are quite long (there could be more than 10.000 dictionaries inside list1), so I guess I need a performant solution (or at least a not very slow one). Order is not important.

Thanks!

CodePudding user response:

The first form of output:

Take the 'text' in each dictionary as two sets, and then use the symmetric_difference method or xor operator:

>>> {d['text'] for d in list1} ^ {d['text'] for d in list2}
{'horse', 'cat'}
>>> {d['text'] for d in list1}.symmetric_difference({d['text'] for d in list2})
{'horse', 'cat'}
>>> [{'text': v} for v in _]
[{'text': 'horse'}, {'text': 'cat'}]

The two methods can be targeted to do some optimization. If operators are used, the set with shorter length can be placed on the left:

>>> timeit(lambda: {d['text'] for d in list1} ^ {d['text'] for d in list2})
0.59890600000017
>>> timeit(lambda: {d['text'] for d in list2} ^ {d['text'] for d in list1})
0.5732289999996283

If you use the symmetric_difference method, you can use generator expressions or maps to avoid explicitly creating a second set:

>>> timeit(lambda: {d['text'] for d in list1}.symmetric_difference({d['text'] for d in list2}))
0.6045051000000967
>>> timeit(lambda: {d['text'] for d in list1}.symmetric_difference(map(itemgetter('text'), list2)))
0.579385199999706

The second form of output:

A simple way to get the dictionary itself in the list is:

  1. Create a dictionary for each list, where the key is the 'text' of each dictionary and the value is the corresponding dictionary.
  2. The dict.keys() can use operators like sets (in Python3.10 , for lower versions, you need to manually convert them to sets.), so use twice subtraction to calculate the difference set, and then take the initial dictionary from the two large dictionaries according to the results.
>>> dict1 = {d['text']: d for d in list1}
>>> dict2 = {d['text']: d for d in list2}
>>> dict1_keys = dict1.keys()    # use set(dict1.keys()) if the version of Python is not 3.10 
>>> dict2_keys = dict2.keys()    # ditto
>>> [dict1[k] for k in dict1_keys - dict2_keys]   [dict2[k] for k in dict2_keys - dict1_keys]
[{'text': 'horse', 'number': 40}, {'text': 'cat', 'number': 40}]

Note that using the xor operator to directly obtain the symmetry difference here may not be an ideal method, because you also need to take the results from the large dictionary separately. If you want to use the xor operator, you can combine the two dictionaries and take values from them:

>>> list(map((dict1 | dict2).__getitem__, dict1_keys ^ dict2_keys))
[{'text': 'horse', 'number': 40}, {'text': 'cat', 'number': 40}]

CodePudding user response:

in O(N M) you can do this way

# your code goes here

list1 = [{'text': 'dog', 'number': 10},{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}]

list2 = [{'text': 'dog'}]

matched = {}
no_match =[]

for i in list2:
        matched[i['text']] = []

for i in list1:
    if i['text'] in matched:
        matched[i['text']].append(i)
    else:
        no_match.append(i)
matched = matched.values()

print(matched, no_match)

output

dict_values([[{'text': 'dog', 'number': 10}]]) [{'text': 'cat', 'number': 40}, {'text': 'horse', 'number': 40}]

CodePudding user response:

I would use set arithmetics following way

list1 = [{'text': 'dog', 'number': 10},{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}] 
list2 = [{'text': 'dog'}]
texts1 = set(i['text'] for i in list1) 
texts2 = set(i['text'] for i in list2)
texts = texts1.symmetric_difference(texts2)
list_notmatch1 = [{"text":i} for i in texts]
list_notmatch2 = [i for i in list1 list2 if i['text'] in texts]
print(list_notmatch1)
print(list_notmatch2)

output

[{'text': 'horse'}, {'text': 'cat'}]
[{'text': 'cat', 'number': 40}, {'text': 'horse', 'number': 40}]

Explanation: I create set from texts from each list, then use symmetric_difference which does

Return the symmetric difference of two sets as a new set.

(i.e. all elements that are in exactly one of the sets.)

then texts might be used to create 1st format or used to filter concatenation of list1 and list2 to get 2nd format.

CodePudding user response:

You can try this:

list1 = [{'text': 'dog', 'number': 10},{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}] 

list2 = [{'text': 'dog'}]

result = []
for d1 in list1:
    if not any(d2['text'] == d1['text'] for d2 in list2):
        result.append(d1)
print(result)

Output:

[{'text': 'cat', 'number': 40}, {'text': 'horse', 'number': 40}]
  • Related