Home > Net >  How to transform a multi-dimensional symmetric dataframe to a 2-column dataframe without repeating?
How to transform a multi-dimensional symmetric dataframe to a 2-column dataframe without repeating?

Time:12-12

Here is my data. I want to transform this multi-dimensional symmetric dataframe to a 2-column dataframe without repeating.

#coding=utf-8
import pandas as pd
import numpy as np
#########

cor = [('E1',0,0,0.8,-0.8),
         ('E2',0,0,1.0,1.0),
         ('E3',0.8,1.0,0,1.0),
         ('E4',-0.8,1.0,1.0,0)]
label = ['','E1','E2', 'E3', 'E4']
R = pd.DataFrame.from_records(cor, columns=label)
print(R)

My data

I want to transform it to a 2-column matrix without repeating. The main result I want is like:

the expected result

CodePudding user response:

Use DataFrame.stack for reshape (first convert empty string column to index), set new index names by DataFrame.rename_axis and remove duplicates in MultiIndex by mapping frozensets:

R = R.set_index('').stack().rename_axis(['x','y'])

R = R[~R.index.map(frozenset).duplicated()].reset_index(name='cor')
print(R)
    x   y  cor
0  E1  E1  0.0
1  E1  E2  0.0
2  E1  E3  0.8
3  E1  E4 -0.8
4  E2  E2  0.0
5  E2  E3  1.0
6  E2  E4  1.0
7  E3  E3  0.0
8  E3  E4  1.0
9  E4  E4  0.0

Or convert lower triangle values to missing values in DataFrame.where with numpy.triu, so stack remove missing values:

R = R.set_index('')


print(R.where(np.triu(np.ones(R.shape, dtype=bool))))
     E1   E2   E3   E4
                      
E1  0.0  0.0  0.8 -0.8
E2  NaN  0.0  1.0  1.0
E3  NaN  NaN  0.0  1.0
E4  NaN  NaN  NaN  0.0

R = (R.where(np.triu(np.ones(R.shape, dtype=bool)))
      .stack()
      .rename_axis(['x','y'])
      .reset_index(name='cor'))

print(R)
    x   y  cor
0  E1  E1  0.0
1  E1  E2  0.0
2  E1  E3  0.8
3  E1  E4 -0.8
4  E2  E2  0.0
5  E2  E3  1.0
6  E2  E4  1.0
7  E3  E3  0.0
8  E3  E4  1.0
9  E4  E4  0.0
  • Related