Home > Mobile >  How do I compare two lists of dictionaries?
How do I compare two lists of dictionaries?

Time:08-18

Working on a problem comparing two lists of dictionaries,

a = [{"colA":"red", "colB":"red", "colC":1},{"colA":"grape", "colB":"orange", "colC":4},{"colA":"tan", "colB":"mustard", "colC":3}]  
b =  [{"colA":"red", "colB":"red", "colC":1},{"colA":"red", "colB":"red", "colC":1},{"colA":"red", "colB":"red", "colC":1, "colD": 3}] 

what's an efficient way to compare the two lists to see how many dictionaries in "a" match dictionaries in "b"? (I might have 1 million dictionaries in the list)

2.) I want to check for one list, how many duplicate dictionaries there are within that one list?

CodePudding user response:

maybe try this, this is for exact match, for partial match you need to modify the dictionary matching function

a = [{"colA":"red", "colB":"red", "colC":1},{"colA":"grape", "colB":"orange", "colC":4},{"colA":"tan", "colB":"mustard", "colC":3}]  
b =  [{"colA":"red", "colB":"red", "colC":1},{"colA":"red", "colB":"red", "colC":1},{"colA":"red", "colB":"red", "colC":1, "colD": 3}]

modified_a = {}


def modifiy(data):
    result = {}
    for i in data:
        key = sorted(i.keys())
        values = []
        for k in key:
            values.extend([k, i[k]])
        values = tuple(values)
        print(values)
        if values not in result:
            result[values]=0
        
        result[values] =1
    return result


modified_a = modifiy(a)
modified_b =modifiy(b)

common = sum(min(modified_a[i], modified_b[i]) for i in modified_a if i in modified_b)
print(common)

CodePudding user response:

Python sets are a feasible way to solve this problem. Convert each list of dictionaries into a Python set formed by tuples (has to be tuples, since sets can't unhash the dict_items object Python creates when applying the function items() to a dictionary)

set_a = {tuple(dict_.items()) for dict_ in a}
set_b = {tuple(dict_.items()) for dict_ in b}

To see the dictionaries of a that are in b (dictionaries in the form of a tuple of tuples):

set_a.intersection(set_b)

To check how many duplicates are within one list:

len(a) - len(set_a)

Sets do not store repeated entries, if there is any repeated item in a, the difference is going to be greater than 0

CodePudding user response:

Based on the information given, here's an answer (albeit primitive) that I put together.

a = [
        { "colA": "red", "colB": "red", "colC": 1 },
        { "colA": "grape", "colB": "orange", "colC": 4 },
        { "colA": "tan", "colB": "mustard", "colC": 3 }
    ]  
b =  [
        { "colA": "red", "colB": "red", "colC": 1 },
        { "colA": "red", "colB": "red", "colC": 1 },
        { "colA": "red", "colB": "red", "colC": 1, "colD": 3}
    ]

a_to_b_matches: list = []
for entry in a:
    if(entry in b):
        a_to_b_matches.append(entry)

a_list_dict_duplicates: list = []
a_temp: list = []
for entry in a:
    if(entry in a_temp):
        a_list_dict_duplicates.append(entry)
    else:
        a_temp.append(entry)

b_list_dict_duplicates: list = []
b_temp: list = []
for entry in b:
    if(entry in b_temp):
        b_list_dict_duplicates.append(entry)
    else:
        b_temp.append(entry)

CodePudding user response:

I think if your data is extremely huge, using pandas is a good idea:

df_a = pd.DataFrame(a)
df_b = pd.DataFrame(b)
cols = list(set(df_a.columns.values) & set(df_b.columns.values))
df_a[cols].apply(tuple, axis=1).isin(df_b[cols].apply(tuple, axis=1))
  • Related