There is a data frame:
data = pd.DataFrame({'o':[1,1,2,1,2], 'd':[1,2,1,3,3], 'distance':[3,5,10,6,8]})
I want to average the third column when the indexes of the first two columns are the same or reversed. For example, if
o = 1, d = 2, distance = 5
oro = 2, d = 1, distance = 10
, there iso = 1, d = 2, distance = 7.5
oro = 2, d = 1, distance = 7.5
.Next, according to this three columns dataframe, I want to generate a symmetric matrix, where
o
andd
is index and column, and the value isdistance
. In addition, the diagonal of the matrix is set to 0.The ideal result is:
a = np.array([[0,7.5,6], [7.5,0,8], [6,8,0]])
CodePudding user response:
import pandas as pd
import numpy as np
data = pd.DataFrame({'o':[1,1,2,1,2],
'd':[1,2,1,3,3],
'distance':[3,5,10,6,8]})
data['query'] = data.apply(lambda row: tuple(sorted([row.o, row.d])), axis=1) # ignore order
avg = data.groupby('query').distance.mean()
N = max(data.o.max(), data.d.max()) # square matrix
result = np.zeros((N, N), dtype=np.float32)
for (row, col), v in avg.items():
row, col = row - 1, col -1 # index start from 0
if row == col: continue # diagonal set to 0
result[row][col] = result[col][row] = v # symmetric
print(result)
CodePudding user response:
Use:
#assigned sorted values to columns back
data[['o','d']] = np.sort(data[['o','d']], axis=1)
#pivoting with aggregate mean
df = data.pivot_table(index='o',columns='d', values='distance', aggfunc='mean')
#create matrix
df = df.combine_first(df.T)
#set 0 to diagonal
np.fill_diagonal(df.to_numpy(), 0)
print (df)
1 2 3
1 0.0 7.5 6.0
2 7.5 0.0 8.0
3 6.0 8.0 0.0