temp_dict = dict()
for i in df['Incident_Type'].unique():
temp_df = pd.DataFrame()
col2 = []
col3 = []
for j in df['Incident_Type'].unique():
if i!=j:
v = nlp(i).similarity(nlp(j))
col2.append(j)
col3.append(v)
else:
continue
idx = col3.index(np.max(col3))
temp_dict[i] = col2[idx]
In the above code, in temp_dict, the values are appended as:{A:B,B:A}. As A and B are already compared and appended, how can I prevent comparing and appending of B:A
One more thing, if while comparing A:B, B:C, if B:C are found more closely related to each other, we just want B:C. A will have other element more closelt related to it
CodePudding user response:
You wan't every pair of key: item
to be unique, no matter the order of them, right? If that's case, you can do the following:
The method dict.items()
returns a dict_items
object, which is a view
of dictionary keys and items
that consists of a tuple
for every pair of key: item
, so you can compare the values you want to check in the tuples
returned:
(i, j) not in temp_dict.items()
on your code, could be something like this:
temp_dict = dict()
for i in df['Incident_Type'].unique():
temp_df = pd.DataFrame()
col2 = []
col3 = []
for j in df['Incident_Type'].unique():
if i!=j and (i, j) not in temp_dict.items() and (j, i) not in temp_dict.items():
v = nlp(i).similarity(nlp(j))
col2.append(j)
col3.append(v)
else:
continue
idx = col3.index(np.max(col3))
temp_dict[i] = col2[idx]
CodePudding user response:
With data as below I hope this could help you:
import pandas as pd
#
# S a m p l e D a t a F r a m e
data = [{"Incident_Type": 'A'}, {"Incident_Type": 'D'}, {"Incident_Type": 'B'}, {"Incident_Type": 'A'}, {"Incident_Type": 'C'}, {"Incident_Type": 'B'}, {"Incident_Type": 'A'},
{"Incident_Type": 'D'}, {"Incident_Type": 'X'}, {"Incident_Type": 'B'}, {"Incident_Type": 'A'}, {"Incident_Type": 'Z'}, {"Incident_Type": 'B'}, {"Incident_Type": 'A'}]
df = pd.DataFrame(data)
print(df)
print('--------------------------------------')
temp_dict = {}
temp_list = []
for i in df['Incident_Type'].unique():
for j in df['Incident_Type'].unique():
if i != j and i not in temp_dict.keys() and i not in temp_dict.values() and j not in temp_dict.keys() and j not in temp_dict.values():
temp_dict[i] = j
print(temp_dict)
'''
Incident_Type
0 A
1 D
2 B
3 A
4 C
5 B
6 A
7 D
8 X
9 B
10 A
11 Z
12 B
13 A
--------------------------------------
{'A': 'D', 'B': 'C', 'X': 'Z'}
'''