Home > Software design >  a weird duplication about merge values, keys in a dictionary
a weird duplication about merge values, keys in a dictionary

Time:10-21

I have stuck on a question for several days, anyone can provide a hint, I appreciate!

Description:

I have a dictionary, but I want to merge some of its values and keys, for example:

Input:
initial_dict = {'aa': ['AA'],'bb':['BB'],'BB':['MM'],'cc':['dd'],'dd':['GG','HH','LL']}

Desired output:
goal_dict = {'aa': ['AA'],'bb':['BB','MM'],'cc':['dd','GG','HH','LL']}

Which means, if a key/value has been in its previous keys or values, then attach its value/key to the previous value if this value/key has not been there yet (maybe not so clear, but see the above input and output).

I have the following code, I think they are perfect, but I got a lot duplication outputs:

dict_head_conj_pair = {'aa': ['AA'],'bb':['BB'],'BB':['MM'],'cc':['dd'],'dd':['GG','HH','LL']}
dict_head_conj_new = {}
print(dict_head_conj_pair)
for a_key in dict_head_conj_pair.keys():
    if dict_head_conj_new:
        if a_key not in dict_head_conj_new.keys():
            for idx_value, new_value_list in enumerate(dict_head_conj_new.copy().values()):
                if a_key in new_value_list:
                    for idx_key, new_key in enumerate(dict_head_conj_new.copy().keys()):
                        if idx_key == idx_value:
                            target_key = new_key
                            for con_0 in range(len(dict_head_conj_pair[a_key])):
                                for new_value_list_1 in dict_head_conj_new.copy().values():
                                    dict_head_conj_new[target_key].append(dict_head_conj_pair[a_key][con_0])
                else:
                    for con_1 in range(len(dict_head_conj_pair[a_key])):
                        if con_1 == 0:
                            dict_head_conj_new.setdefault(a_key, []).append(dict_head_conj_pair[a_key][con_1])
                        else:
                            dict_head_conj_new[a_key].append(dict_head_conj_pair[a_key][con_1])
    else:
        for con_2 in range(len(dict_head_conj_pair[a_key])):
            if con_2==0:
                dict_head_conj_new.setdefault(a_key, []).append(dict_head_conj_pair[a_key][con_2])
            else:
                dict_head_conj_new[a_key].append(dict_head_conj_pair[a_key][con_2])
print("dict_head_conj_new: ",dict_head_conj_new)

Current undesired output:

dict_head_conj_new:  {'aa': ['AA'], 'bb': ['BB', 'MM', 'MM', 'MM'], 'BB': ['MM'], 'cc': ['dd', 'dd', 'dd', 'GG', 'GG', 'GG', 'GG', 'GG', 'HH', 'HH', 'HH', 'HH', 'HH', 'LL', 'LL', 'LL', 'LL', 'LL'], 'dd': ['GG', 'HH', 'LL', 'GG', 'HH', 'LL', 'GG', 'HH', 'LL']}

If anyone can see where am I wrong or provide a hint on how to get my desired result, I much appreciate!

Thanks!

CodePudding user response:

some_list = [1, 1, 1, 2, 3]
some_list = list(dict.fromkeys(some_list))

You could run a debugger to find what is causing the problem (which might be good), but if you just need to remove duplicates, then above code might work. (For removing duplicates from a list).

Also, I have a feeling like graph theory algorithms might be relevant here... (As in, in a dictionary, see if you can access an element with the key, and if you can, you can sort of "spider into" a tree)

CodePudding user response:

If I understood correctly, you can model this as a graph problem, so I suggest you use networkx:

import operator

import networkx as nx

initial_dict = {'aa': ['AA'], 'bb': ['BB'], 'BB': ['MM'], 'cc': ['dd'], 'dd': ['GG', 'HH', 'LL']}
dg = nx.convert.from_dict_of_lists(initial_dict, create_using=nx.DiGraph)

# these are the keys of the dictionaries
seeds = [n for n in dg.nodes if not dg.in_degree(n)]


def descendants(g, source):
    """This function finds all the descendants of source in g sorted by distance to source"""
    desc = sorted(nx.shortest_path_length(g, source).items(), key=operator.itemgetter(1))
    return [d for d, _ in desc if d != source]


# return as dictionary
res = {seed: descendants(dg, seed) for seed in seeds}
print(res)

Output

   {'aa': ['AA'], 'bb': ['BB', 'MM'], 'cc': ['dd', 'HH', 'GG', 'LL']}

CodePudding user response:

This works for your data inasmuch as it produces your desired output. Potentially inefficient (because of the need to search for values) for very large dictionaries though but certainly a lot less code than in your original question.

initial_dict = {'aa': ['AA'],'bb':['BB'],'BB':['MM'],'cc':['dd'],'dd':['GG','HH','LL']}
goal_dict = dict()

def find_value(val):
    for k, v in goal_dict.items():
        if val in v:
            return k

for k, v in initial_dict.items():
    if (_k := find_value(k)):
        goal_dict[_k]  = v
    elif not k in goal_dict:
        goal_dict[k] = v
  • Related