Home > Back-end >  Sorting correlation matrix
Sorting correlation matrix

Time:08-14

I want to convert the correlation matrix to the "pandas" table, sorted from the largest value to the smallest, as in the image. How can I do it?

df = pd.DataFrame(np.random.randint(0,15,size=(20, 6)), columns=["Ply_1","Ply_2","Ply_3","Ply_4","Ply_5","Ply_6"])
df['date'] = pd.date_range('2000-1-1', periods=20, freq='D')
df = df.set_index(['date'])
cor=df.corr()
print(cor)

Out image link here

CodePudding user response:

pd.concat([cor[col_name].sort_values(ascending=False).rename_axis(col_name.replace('Ply', 'index')).reset_index() for col_name in cor], 
          axis=1)

With my randomly generated numbers:

index_1 Ply_1 index_2 Ply_2 index_3 Ply_3 index_4 Ply_4 index_5 Ply_5 index_6 Ply_6
0 Ply_1 1 Ply_2 1 Ply_3 1 Ply_4 1 Ply_5 1 Ply_6 1
1 Ply_2 0.387854 Ply_1 0.387854 Ply_1 0.258825 Ply_1 0.337613 Ply_4 0.0618012 Ply_1 0.058282
2 Ply_4 0.337613 Ply_4 0.293496 Ply_4 0.0552454 Ply_2 0.293496 Ply_2 0.060881 Ply_3 -0.207621
3 Ply_3 0.258825 Ply_5 0.060881 Ply_2 -0.0900126 Ply_5 0.0618012 Ply_3 -0.110885 Ply_2 -0.22012
4 Ply_6 0.058282 Ply_3 -0.0900126 Ply_5 -0.110885 Ply_3 0.0552454 Ply_1 -0.390893 Ply_4 -0.291842
5 Ply_5 -0.390893 Ply_6 -0.22012 Ply_6 -0.207621 Ply_6 -0.291842 Ply_6 -0.394074 Ply_5 -0.394074

Explanation:

pd.concat([cor[col_name]  \
               .sort_values(ascending=False)  \
               .rename_axis(col_name.replace('Ply', 'index'))  \
               .reset_index() 
           for col_name in cor],

          axis=1)
  • pd.concat([df_1, ..., df_6], axis=1) concatenates 6 dataframes (each one will be already sorted and will have 2 columns: ‘index_i’ and ‘Ply_i’).

  • [cor[col_name] for col_name in cor] would create a list of 6 Series, where each Series is the next column of cor.

  • ser.sort_values(ascending=False) sorts values of a Series ser in the descending order (indices also move with their values).

  • col_name.replace('Ply', 'index') creates a new string from a string col_name by replacing 'Ply' with 'index'.

  • ser.rename_axis(name).reset_index() renames the index axis, and extracts the index (with its name) as a new column, converting a Series into a DataFrame. The new index of this dataframe is the default range index (from 0 to 6).

  • Related