Let's say I have a correlation matrix df.corr():
A B C
A 1.000000,0.500670,0.429114
B 0.500670,1.000000,0.392397
C 0.429114,0.392397,1.000000
And I would like to sort the correlations in a descending order, so that the output is showing me something like:
- A/B -> 0.5
- A/C -> 0.43
- B/C -> 0.39
The thing is, I want to avoid hardcoding it with a for loop, but instead do it in an efficient way (I'm dealing with a lot of data in my project). Should I do it with some pandas function or is there something more recommendable? Would you mind sharing some code?
CodePudding user response:
A numpy approach:
# convert to numpy array
corr_np = df.to_numpy()
# extract upper triangular values, excluding diagonal
rows, cols = np.triu_indices_from(corr_np, k=1)
# flat the array and get the values
flat = corr_np[rows, cols]
# get the resulting labels
labels = df.columns[rows] "/" df.columns[cols]
# do argsort to get the final position
indices = np.argsort(flat)[::-1]
# create Series for result
res = pd.Series(data=flat[indices], index=labels[indices])
print(res)
Output
A/B 0.500670
A/C 0.429114
B/C 0.392397
dtype: float64