I have a dataframe where I want to convert the columns headers into aggregated rows and a column into the header:
Athlete Variable 28.0 28.0 28.0 29.0 29.0 29.0
John Minutes 23 32 43 21 43 35
John Distance 23.2 32.7 54.2 24.2 47.8 32.1
John Sprints 5 3 8 3 9 2
Bob Minutes 25 23 43 31 38 23
Bob Distance 32.4 35.6 43.7 21.8 26.8 12.5
Bob Sprints 8 4 8 6 9 2
Desired dataframe:
Athlete week Minutes Distance Sprints
John 28 98 110.1 16
John 29 99 104.1 14
Bob 28 91 111.7 20
Bob 29 92 61.1 17
CodePudding user response:
You’re looking for pivot_table
, but you’ll need to melt
the dataframe to a mono-dimensional format first:
>>> data = df.melt(id_vars=['Athlete', 'Variable'], var_name='week')
>>> data
Athlete Variable week value
0 John Minutes 28.0 23.0
1 John Distance 28.0 23.2
2 John Sprints 28.0 5.0
3 Bob Minutes 28.0 25.0
4 Bob Distance 28.0 32.4
5 Bob Sprints 28.0 8.0
6 John Minutes 28.0 32.0
7 John Distance 28.0 32.7
8 John Sprints 28.0 3.0
9 Bob Minutes 28.0 23.0
10 Bob Distance 28.0 35.6
11 Bob Sprints 28.0 4.0
12 John Minutes 28.0 43.0
13 John Distance 28.0 54.2
14 John Sprints 28.0 8.0
15 Bob Minutes 28.0 43.0
16 Bob Distance 28.0 43.7
17 Bob Sprints 28.0 8.0
18 John Minutes 29.0 21.0
19 John Distance 29.0 24.2
20 John Sprints 29.0 3.0
21 Bob Minutes 29.0 31.0
22 Bob Distance 29.0 21.8
23 Bob Sprints 29.0 6.0
24 John Minutes 29.0 43.0
25 John Distance 29.0 47.8
26 John Sprints 29.0 9.0
27 Bob Minutes 29.0 38.0
28 Bob Distance 29.0 26.8
29 Bob Sprints 29.0 9.0
30 John Minutes 29.0 35.0
31 John Distance 29.0 32.1
32 John Sprints 29.0 2.0
33 Bob Minutes 29.0 23.0
34 Bob Distance 29.0 12.5
35 Bob Sprints 29.0 2.0
So using pivot_table, you have to specify the aggregation function, here we use sum
, the default being mean
:
>>> data.pivot_table(index=['Athlete', 'week'], columns='Variable', values='value', aggfunc='sum')
Variable Distance Minutes Sprints
Athlete week
Bob 28.0 111.7 91.0 20.0
29.0 61.1 92.0 17.0
John 28.0 110.1 98.0 16.0
29.0 104.1 99.0 14.0
CodePudding user response:
Try with groupby
first then do pivot
and stack
out = df.groupby(level=0,axis=1).sum().pivot('Athlete','Variable').\
rename_axis(['week',None],axis=1).stack(level=0).reset_index()
Out[466]:
Athlete week Distance Minutes Sprints
0 Bob 28.0 111.7 91.0 20.0
1 Bob 29.0 61.1 92.0 17.0
2 John 28.0 110.1 98.0 16.0
3 John 29.0 104.1 99.0 14.0