I have a dataframe defined as follows:
cols = pd.MultiIndex.from_tuples([("unnamed_0", "b"), ("unnamed_1", "c"),('total','A'),('total','B'),('total','C')])
df = pd.DataFrame([[1,2,3,4,5], [3,4,5,6,7],[5,6,8,9,10],[7,8,11,12,13],[9,19,13,24,25]], columns=cols)
df
which is:
unnamed_0 unnamed_1 total
b c A B C
0 1 2 3 4 5
1 3 4 5 6 7
2 5 6 8 9 10
3 7 8 11 12 13
4 9 19 13 24 25
What I want is to transform this dataframe into:
d = {'b': [1, 1,1,3,3,3,5,5,5,7,7,7,9,9,9], 'c': [2,2,2,4,4,4,6,6,6,8,8,8,19,19,19],'total':[3,4,5,5,6,7,8,9,10,11,12,13,13,24,25],'type':['A','B','C','A','B','C','A','B','C','A','B','C','A','B','C']}
df2 = pd.DataFrame(d)
print(df2)
That is:
b c total type
0 1 2 3 A
1 1 2 4 B
2 1 2 5 C
3 3 4 5 A
4 3 4 6 B
5 3 4 7 C
6 5 6 8 A
7 5 6 9 B
8 5 6 10 C
9 7 8 11 A
10 7 8 12 B
11 7 8 13 C
12 9 19 13 A
13 9 19 24 B
14 9 19 25 C
I tried to drop the two unnamed_
columns at first using
df = df.xs(["unnamed_0","unnamed_1"], axis=1, drop_level=True)
but that didn't go to well as it dropped everything but what I wanted. As for the transformation of level 2 header into column values, I thought of using something like df.T.unstack().reset_index(level=1, name='type').rename(columns={'level_1':'A'})[['type','A']]
as in this example:
df=pd.DataFrame(index=['x','y'], data={'a':[1,2],'b':[3,4]})
a b
x 1 3
y 2 4
df.T.unstack().reset_index(level=1, name='c1').rename(columns={'level_1':'c2'})[['c1','c2']]
c1 c2
x 1 a
x 3 b
y 2 a
y 4 b
but I did not manage to apply it to multi-level header dataframes.
Any help would be appreciated.
CodePudding user response:
Use melt
:
>>> df.droplevel(0, axis=1).melt(['b', 'c'], var_name='type', value_name='total')
b c type total
0 1 2 A 3
1 3 4 A 5
2 5 6 A 8
3 7 8 A 11
4 9 19 A 13
5 1 2 B 4
6 3 4 B 6
7 5 6 B 9
8 7 8 B 12
9 9 19 B 24
10 1 2 C 5
11 3 4 C 7
12 5 6 C 10
13 7 8 C 13
14 9 19 C 45