Appending to Dictionary: Preventing Duplicates in both key and value-CodePudding

temp_dict = dict()
for i in df['Incident_Type'].unique():
    temp_df = pd.DataFrame()
    col2 = []
    col3 = []
    for j in df['Incident_Type'].unique():
        if i!=j:
            v = nlp(i).similarity(nlp(j))
            col2.append(j)
            col3.append(v)
        else:
            continue
    idx = col3.index(np.max(col3))
    temp_dict[i] = col2[idx]

In the above code, in temp_dict, the values are appended as:{A:B,B:A}. As A and B are already compared and appended, how can I prevent comparing and appending of B:A

One more thing, if while comparing A:B, B:C, if B:C are found more closely related to each other, we just want B:C. A will have other element more closelt related to it

CodePudding user response：

You wan't every pair of key: item to be unique, no matter the order of them, right? If that's case, you can do the following:

The method dict.items() returns a dict_items object, which is a view of dictionary keys and items that consists of a tuple for every pair of key: item, so you can compare the values you want to check in the tuples returned:

(i, j) not in temp_dict.items()

on your code, could be something like this:


temp_dict = dict()
for i in df['Incident_Type'].unique():
    temp_df = pd.DataFrame()
    col2 = []
    col3 = []
    for j in df['Incident_Type'].unique():
        if i!=j and (i, j) not in temp_dict.items() and (j, i) not in temp_dict.items():
            v = nlp(i).similarity(nlp(j))
            col2.append(j)
            col3.append(v)
        else:
            continue
    idx = col3.index(np.max(col3))
    temp_dict[i] = col2[idx]

CodePudding user response：

With data as below I hope this could help you:

import pandas as pd

#
#   S a m p l e   D a t a F r a m e   
data = [{"Incident_Type": 'A'}, {"Incident_Type": 'D'}, {"Incident_Type": 'B'}, {"Incident_Type": 'A'}, {"Incident_Type": 'C'}, {"Incident_Type": 'B'}, {"Incident_Type": 'A'},
{"Incident_Type": 'D'}, {"Incident_Type": 'X'}, {"Incident_Type": 'B'}, {"Incident_Type": 'A'}, {"Incident_Type": 'Z'}, {"Incident_Type": 'B'}, {"Incident_Type": 'A'}]
df = pd.DataFrame(data)

print(df)
print('--------------------------------------')

temp_dict = {}
temp_list = []
for i in df['Incident_Type'].unique():
    for j in df['Incident_Type'].unique():
        if i != j and i not in temp_dict.keys() and i not in temp_dict.values() and j not in temp_dict.keys() and j not in temp_dict.values():
            temp_dict[i] = j
print(temp_dict)
'''
   Incident_Type
0              A
1              D
2              B
3              A
4              C
5              B
6              A
7              D
8              X
9              B
10             A
11             Z
12             B
13             A
--------------------------------------
{'A': 'D', 'B': 'C', 'X': 'Z'}
'''