I have a similarity matrix stored in numpy array that looks like this
What is the most efficient way to convert them to a dataframe that looks like this?
This is my current code but I don't think it is the most efficient.
sim_npy = np.load('sim.npy')
row=0
for i in range(100):
for j in range(100):
df_tsc.loc[row] = list(np.array(['Item' str(i), 'Item' str(j),str(sim_npy[i][j])]))
row = row 1
CodePudding user response:
Use numpy for improve performance:
a = np.array([[1,0.3,0.5],[0.3,1,0.7],[0.5, 0.7,1]])
print (a)
[[1. 0.3 0.5]
[0.3 1. 0.7]
[0.5 0.7 1. ]]
#upper triangle part includes the diagonal
idx = np.triu_indices(a.shape[0], k = 0)
#repeat range and filter by indices
c = np.repeat(np.arange(1, a.shape[0] 1), a.shape[0]).reshape(a.shape)[idx]
i = np.tile(np.arange(1, a.shape[0] 1), a.shape[0]).reshape(a.shape)[idx]
#filter array by indices
v = a[idx]
#create DataFrame
df = pd.DataFrame({'Source':c, 'Destination': i, 'Similarity_Score': v})
#add substring
df[['Source','Destination']] = 'Item' df[['Source','Destination']].astype(str)
print (df)
Source Destination Similarity_Score
0 Item1 Item1 1.0
1 Item1 Item2 0.3
2 Item1 Item3 0.5
3 Item2 Item2 1.0
4 Item2 Item3 0.7
5 Item3 Item3 1.0
CodePudding user response:
You can refer to this link :
this might helps you Thanks