Merging two lists of dictionaries on non-distinct values-CodePudding

Suppose I have two list of dictionaries:

v = [{'call 1': 'debit card'},
 {'call 2': 'debit card'},
 {'call 3': 'payment limit'},
 {'call 1': 'bond'},
 {'call 2': 'mortgage'},
 {'call 3': 'debit card'},
 {'call 1': nan},
 {'call 2': 'spending limit'},
 {'call 3': nan}]

and

w = [{'cluster 1': 'payment limit'},
 {'cluster 2': 'debit card'},
 {'cluster 3': 'bond'},
 {'cluster 1': 'spending limit'},
 {'cluster 2': 'debit card'},
 {'cluster 3': 'mortgage'},
 {'cluster 1': None},
 {'cluster 2': 'debit card'},
 {'cluster 3': None}]

I want to drop the null values on both and merge the two lists on values of the dictionaries, such that I get:

# desired outcome 
    [{'call 3':{'cluster 1': 'payment limit'}},
     {'call 1':{'cluster 2': 'debit card'}},
     {'call 1':{'cluster 3': 'bond'}},
     {'call 2':{'cluster 1': 'spending limit'}},
     {'call 2':{'cluster 2': 'debit card'}},
     {'call 3':{'cluster 2': 'debit card'}}]

The puzzling part here to me is how to assign the calls to each cluster. As you can see debit card appears in call 1, call 2 and call 3, so in general I should be able to assign a distinct key to each cluster.

CodePudding user response：

A simple way to do it:

v = [{'call 1': 'debit card'},
 {'call 2': 'debit card'},
 {'call 3': 'payment limit'},
 {'call 1': 'bond'},
 {'call 2': 'mortgage'},
 {'call 3': 'debit card'},
 {'call 1': None},
 {'call 2': 'spending limit'},
 {'call 3': None}]

w = [{'cluster 1': 'payment limit'},
 {'cluster 2': 'debit card'},
 {'cluster 3': 'bond'},
 {'cluster 1': 'spending limit'},
 {'cluster 2': 'debit card'},
 {'cluster 3': 'mortgage'},
 {'cluster 1': None},
 {'cluster 2': 'debit card'},
 {'cluster 3': None}]

def join(d1, d2):
    # Step 1
    updated_d1 = []
    for ls in d1:
        for k, v in ls.items():
            if v == None:
                continue
            else:
                updated_d1.append({k: v})
    updated_d2 = []
    for ls in d2:
        for k, v in ls.items():
            if v == None:
                continue
            else:
                updated_d2.append({k: v})
    # Step 2
    d1_dict = {}
    for ls in updated_d1:
        for k, v in ls.items():
            if v in d1_dict:
                d1_dict[v].append(k)
            else:
                d1_dict[v] = [k]
    d2_dict = {}
    for ls in updated_d2:
        for k, v in ls.items():
            if v in d2_dict:
                d2_dict[v].append(k)
            else:
                d2_dict[v] = [k]
    # Step 3
    ls_results = []
    for k, v in d1_dict.items():
        if k in d2_dict:
            for i in v:
                for j in d2_dict[k]:
                    if {i: {j: k}} not in ls_results:
                        ls_results.append({i: {j: k}})
        else:
            continue
    return ls_results
print(join(v, w))

Output:

[
    {'call 1': {'cluster 2': 'debit card'}}, 
    {'call 2': {'cluster 2': 'debit card'}}, 
    {'call 3': {'cluster 2': 'debit card'}}, 
    {'call 3': {'cluster 1': 'payment limit'}}, 
    {'call 1': {'cluster 3': 'bond'}}, 
    {'call 2': {'cluster 3': 'mortgage'}}, 
    {'call 2': {'cluster 1': 'spending limit'}}
]

What's exactly being done?

Step 1. First all dictionaries with None values are removes

Step 2. 2 New Dictionaries are made where the keys are now the values the the values are all the keys which had the same value in the original dictionary

Step 3. Now all that's left is find matching keys in both the dictionaries and add all possible combinations of their values while also tracking for duplicates