I have stuck on a question for several days, anyone can provide a hint, I appreciate!
Description:
I have a dictionary, but I want to merge some of its values and keys, for example:
Input:
initial_dict = {'aa': ['AA'],'bb':['BB'],'BB':['MM'],'cc':['dd'],'dd':['GG','HH','LL']}
Desired output:
goal_dict = {'aa': ['AA'],'bb':['BB','MM'],'cc':['dd','GG','HH','LL']}
Which means, if a key/value has been in its previous keys or values, then attach its value/key to the previous value if this value/key has not been there yet (maybe not so clear, but see the above input and output).
I have the following code, I think they are perfect, but I got a lot duplication outputs:
dict_head_conj_pair = {'aa': ['AA'],'bb':['BB'],'BB':['MM'],'cc':['dd'],'dd':['GG','HH','LL']}
dict_head_conj_new = {}
print(dict_head_conj_pair)
for a_key in dict_head_conj_pair.keys():
if dict_head_conj_new:
if a_key not in dict_head_conj_new.keys():
for idx_value, new_value_list in enumerate(dict_head_conj_new.copy().values()):
if a_key in new_value_list:
for idx_key, new_key in enumerate(dict_head_conj_new.copy().keys()):
if idx_key == idx_value:
target_key = new_key
for con_0 in range(len(dict_head_conj_pair[a_key])):
for new_value_list_1 in dict_head_conj_new.copy().values():
dict_head_conj_new[target_key].append(dict_head_conj_pair[a_key][con_0])
else:
for con_1 in range(len(dict_head_conj_pair[a_key])):
if con_1 == 0:
dict_head_conj_new.setdefault(a_key, []).append(dict_head_conj_pair[a_key][con_1])
else:
dict_head_conj_new[a_key].append(dict_head_conj_pair[a_key][con_1])
else:
for con_2 in range(len(dict_head_conj_pair[a_key])):
if con_2==0:
dict_head_conj_new.setdefault(a_key, []).append(dict_head_conj_pair[a_key][con_2])
else:
dict_head_conj_new[a_key].append(dict_head_conj_pair[a_key][con_2])
print("dict_head_conj_new: ",dict_head_conj_new)
Current undesired output:
dict_head_conj_new: {'aa': ['AA'], 'bb': ['BB', 'MM', 'MM', 'MM'], 'BB': ['MM'], 'cc': ['dd', 'dd', 'dd', 'GG', 'GG', 'GG', 'GG', 'GG', 'HH', 'HH', 'HH', 'HH', 'HH', 'LL', 'LL', 'LL', 'LL', 'LL'], 'dd': ['GG', 'HH', 'LL', 'GG', 'HH', 'LL', 'GG', 'HH', 'LL']}
If anyone can see where am I wrong or provide a hint on how to get my desired result, I much appreciate!
Thanks!
CodePudding user response:
some_list = [1, 1, 1, 2, 3]
some_list = list(dict.fromkeys(some_list))
You could run a debugger to find what is causing the problem (which might be good), but if you just need to remove duplicates, then above code might work. (For removing duplicates from a list).
Also, I have a feeling like graph theory algorithms might be relevant here... (As in, in a dictionary, see if you can access an element with the key, and if you can, you can sort of "spider into" a tree)
CodePudding user response:
If I understood correctly, you can model this as a graph problem, so I suggest you use networkx:
import operator
import networkx as nx
initial_dict = {'aa': ['AA'], 'bb': ['BB'], 'BB': ['MM'], 'cc': ['dd'], 'dd': ['GG', 'HH', 'LL']}
dg = nx.convert.from_dict_of_lists(initial_dict, create_using=nx.DiGraph)
# these are the keys of the dictionaries
seeds = [n for n in dg.nodes if not dg.in_degree(n)]
def descendants(g, source):
"""This function finds all the descendants of source in g sorted by distance to source"""
desc = sorted(nx.shortest_path_length(g, source).items(), key=operator.itemgetter(1))
return [d for d, _ in desc if d != source]
# return as dictionary
res = {seed: descendants(dg, seed) for seed in seeds}
print(res)
Output
{'aa': ['AA'], 'bb': ['BB', 'MM'], 'cc': ['dd', 'HH', 'GG', 'LL']}
CodePudding user response:
This works for your data inasmuch as it produces your desired output. Potentially inefficient (because of the need to search for values) for very large dictionaries though but certainly a lot less code than in your original question.
initial_dict = {'aa': ['AA'],'bb':['BB'],'BB':['MM'],'cc':['dd'],'dd':['GG','HH','LL']}
goal_dict = dict()
def find_value(val):
for k, v in goal_dict.items():
if val in v:
return k
for k, v in initial_dict.items():
if (_k := find_value(k)):
goal_dict[_k] = v
elif not k in goal_dict:
goal_dict[k] = v