Home > OS >  How to convert a symmetric matrix to pandas DataFrame?
How to convert a symmetric matrix to pandas DataFrame?

Time:10-10

The following is a snippet of the symmetric matrix I have. I would like to convert this matrix into a pandas dataframe, to conduct data analysis on it.

       P62258 P61981 P31946 P63104 Q08380
P62258 100
P61981 69.23  100 
P31946 71.95  81.30  100
P63104 75.10  80.81  90.20  100
Q08380 7.45   8.50   7.31   8.16   100

The following is how I would like the dataframe to be.

Protein1  Protein2  Similarity
P62258    P62258    100
P61981    P62258    69.23
P31946    P62258    71.95
P63104    P62258    75.10
Q08380    P62258    7.45
P61981    P61981    100
P31946    P61981    81.30
.
.
.

CodePudding user response:

Let's try

out = (df.stack().dropna().reset_index()
       .rename(columns={'level_0': 'Protein1', 'level_1': 'Protein2', 0: 'Similarity'}))
# or
out = (df.stack().dropna().to_frame('Similarity')
       .rename_axis(index=['Protein1', 'Protein2'])
       .reset_index())
print(out)

   Protein1 Protein2  Similarity
0    P62258   P62258      100.00
1    P61981   P62258       69.23
2    P61981   P61981      100.00
3    P31946   P62258       71.95
4    P31946   P61981       81.30
5    P31946   P31946      100.00
6    P63104   P62258       75.10
7    P63104   P61981       80.81
8    P63104   P31946       90.20
9    P63104   P63104      100.00
10   Q08380   P62258        7.45
11   Q08380   P61981        8.50
12   Q08380   P31946        7.31
13   Q08380   P63104        8.16
14   Q08380   Q08380      100.00
  • Related