I know this questions has been mentioned various times on StackOverflow, I do not find it trivial to accomplish this task. This and many other answers: Adding total row to pandas DataFrame groupby
Sample of my data (there are actually 25 columns for this but they are similar, only numerical):
owner player val1 val1 val3
A x 5.60 3.18 0.76
A y 12.08 15.95 -0.24
A z 0.03 0.05 -0.41
B x 0.02 0.01 2.06
B z 2.36 2.37 0.00
C x 0.16 0.15 0.05
C y 0.72 0.75 -0.04
D x 0.33 0.56 -0.41
My intended output is as follows, where for each owner the total is calculated and placed as the first row in the subgroup.
owner player val1 val1 val3
A total 17.71 19.18 0.11
A x 5.60 3.18 0.76
A y 12.08 15.95 -0.24
A z 0.03 0.05 -0.41
B total 2.38 2.38 2.05
B x 0.02 0.01 2.06
B z 2.36 2.37 0.00
C total 0.88 0.90 0.01
C x 0.16 0.15 0.05
C y 0.72 0.75 -0.04
D total 0.33 0.56 -0.41
D x 0.33 0.56 -0.41
I attempted to use something that I also found on StackOverflow which looked like what I was searching for but I couldn't make it quite right.
def lambda_t(x):
df = x.sort_values(['owner']).drop(['owner'],axis=1)
df.loc['total'] = df.sum()
return df
df.groupby(['owner']).apply(lambda_t)
While in theory this might have been something interesting, the total is not placed where I want and on top of that the values on player name are concatenating, so I end up having a really packed column. This way I end up having a multiindex.
owner player val1 val1 val3
A 0 x 5.60 3.18 0.76
1 y 12.08 15.95 -0.24
2 z 0.03 0.05 -0.41
total xzy 17.71 19.18 0.11
.....
Apparently, dropping level of a multiindex can help but I'm missing the total this way, it disappears.
df.groupby(['owner']).apply(lambda_t).droplevel(level=1)
owner player val1 val1 val3
A x 5.60 3.18 0.76
A y 12.08 15.95 -0.24
A z 0.03 0.05 -0.41
A xzy 17.71 19.18 0.11
Any ideas if this is possible? I've seen that with groupby, assign and loc you can't order them properly.
CodePudding user response:
IIUC, you can use groupby.sum
to compute the totals, assign
the total name as player, concat
the two DataFrames in order and sort_values
with a stable method:
out = (pd
.concat([df.groupby('owner', as_index=False).sum().assign(player='total'),
df])
.sort_values(by='owner', kind='stable', ignore_index=True)
[df.columns]
)
output:
owner player val1 val1 val3
0 A total 17.71 19.18 0.11
1 A x 5.60 3.18 0.76
2 A y 12.08 15.95 -0.24
3 A z 0.03 0.05 -0.41
4 B total 2.38 2.38 2.06
5 B x 0.02 0.01 2.06
6 B z 2.36 2.37 0.00
7 C total 0.88 0.90 0.01
8 C x 0.16 0.15 0.05
9 C y 0.72 0.75 -0.04
10 D total 0.33 0.56 -0.41
11 D x 0.33 0.56 -0.41
CodePudding user response:
Another possible solution:
(df.groupby('owner')
.apply(lambda x:
pd.concat(
[pd.concat([pd.DataFrame({'owner': x.owner.unique(), 'player': ['total']}),
pd.DataFrame(x.iloc[:, 2:].apply(sum, axis=0)).T], axis=1),
x]
))).reset_index(drop=True)
Output:
owner player val1 val2 val3
0 A total 17.71 19.18 0.11
1 A x 5.60 3.18 0.76
2 A y 12.08 15.95 -0.24
3 A z 0.03 0.05 -0.41
4 B total 2.38 2.38 2.06
5 B x 0.02 0.01 2.06
6 B z 2.36 2.37 0.00
7 C total 0.88 0.90 0.01
8 C x 0.16 0.15 0.05
9 C y 0.72 0.75 -0.04
10 D total 0.33 0.56 -0.41
11 D x 0.33 0.56 -0.41