I want to convert the correlation matrix to the "pandas" table, sorted from the largest value to the smallest, as in the image. How can I do it?
df = pd.DataFrame(np.random.randint(0,15,size=(20, 6)), columns=["Ply_1","Ply_2","Ply_3","Ply_4","Ply_5","Ply_6"])
df['date'] = pd.date_range('2000-1-1', periods=20, freq='D')
df = df.set_index(['date'])
cor=df.corr()
print(cor)
CodePudding user response:
pd.concat([cor[col_name].sort_values(ascending=False).rename_axis(col_name.replace('Ply', 'index')).reset_index() for col_name in cor],
axis=1)
With my randomly generated numbers:
index_1 | Ply_1 | index_2 | Ply_2 | index_3 | Ply_3 | index_4 | Ply_4 | index_5 | Ply_5 | index_6 | Ply_6 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Ply_1 | 1 | Ply_2 | 1 | Ply_3 | 1 | Ply_4 | 1 | Ply_5 | 1 | Ply_6 | 1 |
1 | Ply_2 | 0.387854 | Ply_1 | 0.387854 | Ply_1 | 0.258825 | Ply_1 | 0.337613 | Ply_4 | 0.0618012 | Ply_1 | 0.058282 |
2 | Ply_4 | 0.337613 | Ply_4 | 0.293496 | Ply_4 | 0.0552454 | Ply_2 | 0.293496 | Ply_2 | 0.060881 | Ply_3 | -0.207621 |
3 | Ply_3 | 0.258825 | Ply_5 | 0.060881 | Ply_2 | -0.0900126 | Ply_5 | 0.0618012 | Ply_3 | -0.110885 | Ply_2 | -0.22012 |
4 | Ply_6 | 0.058282 | Ply_3 | -0.0900126 | Ply_5 | -0.110885 | Ply_3 | 0.0552454 | Ply_1 | -0.390893 | Ply_4 | -0.291842 |
5 | Ply_5 | -0.390893 | Ply_6 | -0.22012 | Ply_6 | -0.207621 | Ply_6 | -0.291842 | Ply_6 | -0.394074 | Ply_5 | -0.394074 |
Explanation:
pd.concat([cor[col_name] \
.sort_values(ascending=False) \
.rename_axis(col_name.replace('Ply', 'index')) \
.reset_index()
for col_name in cor],
axis=1)
pd.concat([df_1, ..., df_6], axis=1)
concatenates 6 dataframes (each one will be already sorted and will have 2 columns: ‘index_i’ and ‘Ply_i’).[cor[col_name] for col_name in cor]
would create a list of 6 Series, where each Series is the next column ofcor
.ser.sort_values(ascending=False)
sorts values of a Seriesser
in the descending order (indices also move with their values).col_name.replace('Ply', 'index')
creates a new string from a stringcol_name
by replacing 'Ply' with 'index'.ser.rename_axis(name).reset_index()
renames the index axis, and extracts the index (with its name) as a new column, converting a Series into a DataFrame. The new index of this dataframe is the default range index (from 0 to 6).