The following is a snippet of the symmetric matrix I have. I would like to convert this matrix into a pandas dataframe, to conduct data analysis on it.
P62258 P61981 P31946 P63104 Q08380
P62258 100
P61981 69.23 100
P31946 71.95 81.30 100
P63104 75.10 80.81 90.20 100
Q08380 7.45 8.50 7.31 8.16 100
The following is how I would like the dataframe to be.
Protein1 Protein2 Similarity
P62258 P62258 100
P61981 P62258 69.23
P31946 P62258 71.95
P63104 P62258 75.10
Q08380 P62258 7.45
P61981 P61981 100
P31946 P61981 81.30
.
.
.
CodePudding user response:
Let's try
out = (df.stack().dropna().reset_index()
.rename(columns={'level_0': 'Protein1', 'level_1': 'Protein2', 0: 'Similarity'}))
# or
out = (df.stack().dropna().to_frame('Similarity')
.rename_axis(index=['Protein1', 'Protein2'])
.reset_index())
print(out)
Protein1 Protein2 Similarity
0 P62258 P62258 100.00
1 P61981 P62258 69.23
2 P61981 P61981 100.00
3 P31946 P62258 71.95
4 P31946 P61981 81.30
5 P31946 P31946 100.00
6 P63104 P62258 75.10
7 P63104 P61981 80.81
8 P63104 P31946 90.20
9 P63104 P63104 100.00
10 Q08380 P62258 7.45
11 Q08380 P61981 8.50
12 Q08380 P31946 7.31
13 Q08380 P63104 8.16
14 Q08380 Q08380 100.00