a weird duplication about merge values, keys in a dictionary-CodePudding

I have stuck on a question for several days, anyone can provide a hint, I appreciate!

Description:

I have a dictionary, but I want to merge some of its values and keys, for example:

Input:
initial_dict = {'aa': ['AA'],'bb':['BB'],'BB':['MM'],'cc':['dd'],'dd':['GG','HH','LL']}

Desired output:
goal_dict = {'aa': ['AA'],'bb':['BB','MM'],'cc':['dd','GG','HH','LL']}

Which means, if a key/value has been in its previous keys or values, then attach its value/key to the previous value if this value/key has not been there yet (maybe not so clear, but see the above input and output).

I have the following code, I think they are perfect, but I got a lot duplication outputs:

dict_head_conj_pair = {'aa': ['AA'],'bb':['BB'],'BB':['MM'],'cc':['dd'],'dd':['GG','HH','LL']}
dict_head_conj_new = {}
print(dict_head_conj_pair)
for a_key in dict_head_conj_pair.keys():
    if dict_head_conj_new:
        if a_key not in dict_head_conj_new.keys():
            for idx_value, new_value_list in enumerate(dict_head_conj_new.copy().values()):
                if a_key in new_value_list:
                    for idx_key, new_key in enumerate(dict_head_conj_new.copy().keys()):
                        if idx_key == idx_value:
                            target_key = new_key
                            for con_0 in range(len(dict_head_conj_pair[a_key])):
                                for new_value_list_1 in dict_head_conj_new.copy().values():
                                    dict_head_conj_new[target_key].append(dict_head_conj_pair[a_key][con_0])
                else:
                    for con_1 in range(len(dict_head_conj_pair[a_key])):
                        if con_1 == 0:
                            dict_head_conj_new.setdefault(a_key, []).append(dict_head_conj_pair[a_key][con_1])
                        else:
                            dict_head_conj_new[a_key].append(dict_head_conj_pair[a_key][con_1])
    else:
        for con_2 in range(len(dict_head_conj_pair[a_key])):
            if con_2==0:
                dict_head_conj_new.setdefault(a_key, []).append(dict_head_conj_pair[a_key][con_2])
            else:
                dict_head_conj_new[a_key].append(dict_head_conj_pair[a_key][con_2])
print("dict_head_conj_new: ",dict_head_conj_new)

Current undesired output:

dict_head_conj_new:  {'aa': ['AA'], 'bb': ['BB', 'MM', 'MM', 'MM'], 'BB': ['MM'], 'cc': ['dd', 'dd', 'dd', 'GG', 'GG', 'GG', 'GG', 'GG', 'HH', 'HH', 'HH', 'HH', 'HH', 'LL', 'LL', 'LL', 'LL', 'LL'], 'dd': ['GG', 'HH', 'LL', 'GG', 'HH', 'LL', 'GG', 'HH', 'LL']}

If anyone can see where am I wrong or provide a hint on how to get my desired result, I much appreciate!

Thanks!

CodePudding user response：

some_list = [1, 1, 1, 2, 3]
some_list = list(dict.fromkeys(some_list))

You could run a debugger to find what is causing the problem (which might be good), but if you just need to remove duplicates, then above code might work. (For removing duplicates from a list).

Also, I have a feeling like graph theory algorithms might be relevant here... (As in, in a dictionary, see if you can access an element with the key, and if you can, you can sort of "spider into" a tree)

CodePudding user response：

If I understood correctly, you can model this as a graph problem, so I suggest you use networkx:

import operator

import networkx as nx

initial_dict = {'aa': ['AA'], 'bb': ['BB'], 'BB': ['MM'], 'cc': ['dd'], 'dd': ['GG', 'HH', 'LL']}
dg = nx.convert.from_dict_of_lists(initial_dict, create_using=nx.DiGraph)

# these are the keys of the dictionaries
seeds = [n for n in dg.nodes if not dg.in_degree(n)]


def descendants(g, source):
    """This function finds all the descendants of source in g sorted by distance to source"""
    desc = sorted(nx.shortest_path_length(g, source).items(), key=operator.itemgetter(1))
    return [d for d, _ in desc if d != source]


# return as dictionary
res = {seed: descendants(dg, seed) for seed in seeds}
print(res)

Output

   {'aa': ['AA'], 'bb': ['BB', 'MM'], 'cc': ['dd', 'HH', 'GG', 'LL']}

CodePudding user response：

This works for your data inasmuch as it produces your desired output. Potentially inefficient (because of the need to search for values) for very large dictionaries though but certainly a lot less code than in your original question.

initial_dict = {'aa': ['AA'],'bb':['BB'],'BB':['MM'],'cc':['dd'],'dd':['GG','HH','LL']}
goal_dict = dict()

def find_value(val):
    for k, v in goal_dict.items():
        if val in v:
            return k

for k, v in initial_dict.items():
    if (_k := find_value(k)):
        goal_dict[_k]  = v
    elif not k in goal_dict:
        goal_dict[k] = v