Convert Similarity Score Matrix to Pandas Dataframe-CodePudding

I have a similarity matrix stored in numpy array that looks like this

What is the most efficient way to convert them to a dataframe that looks like this?

This is my current code but I don't think it is the most efficient.

sim_npy = np.load('sim.npy')
row=0
for i in range(100):
    for j in range(100):
        df_tsc.loc[row] = list(np.array(['Item' str(i), 'Item' str(j),str(sim_npy[i][j])]))
        row = row   1

CodePudding user response：

Use numpy for improve performance:

a = np.array([[1,0.3,0.5],[0.3,1,0.7],[0.5, 0.7,1]])
print (a)
[[1.  0.3 0.5]
 [0.3 1.  0.7]
 [0.5 0.7 1. ]]

#upper triangle part includes the diagonal
idx = np.triu_indices(a.shape[0], k = 0)

#repeat range and filter by indices
c = np.repeat(np.arange(1, a.shape[0] 1), a.shape[0]).reshape(a.shape)[idx]
i = np.tile(np.arange(1, a.shape[0] 1), a.shape[0]).reshape(a.shape)[idx]
#filter array by indices
v = a[idx]

#create DataFrame
df = pd.DataFrame({'Source':c, 'Destination': i, 'Similarity_Score': v})
#add substring
df[['Source','Destination']] = 'Item' df[['Source','Destination']].astype(str)
print (df)
  Source Destination  Similarity_Score
0  Item1       Item1               1.0
1  Item1       Item2               0.3
2  Item1       Item3               0.5
3  Item2       Item2               1.0
4  Item2       Item3               0.7
5  Item3       Item3               1.0

CodePudding user response：

You can refer to this link :

https://www.geeksforgeeks.org/create-a-dataframe-from-a-numpy-array-and-specify-the-index-column-and-column-headers/

this might helps you Thanks