Home > Blockchain >  Pandas - adding a total row to each subgroup as first row
Pandas - adding a total row to each subgroup as first row

Time:09-20

I know this questions has been mentioned various times on StackOverflow, I do not find it trivial to accomplish this task. This and many other answers: Adding total row to pandas DataFrame groupby

Sample of my data (there are actually 25 columns for this but they are similar, only numerical):

owner   player  val1    val1    val3
A       x       5.60    3.18    0.76
A       y       12.08   15.95   -0.24
A       z       0.03    0.05    -0.41
B       x       0.02    0.01    2.06
B       z       2.36    2.37    0.00
C       x       0.16    0.15    0.05
C       y       0.72    0.75    -0.04
D       x       0.33    0.56    -0.41

My intended output is as follows, where for each owner the total is calculated and placed as the first row in the subgroup.

owner   player  val1    val1    val3
A      total    17.71   19.18   0.11
A      x        5.60    3.18    0.76
A      y        12.08   15.95   -0.24
A      z        0.03    0.05    -0.41
B      total    2.38    2.38    2.05
B      x        0.02    0.01    2.06
B      z        2.36    2.37    0.00
C      total    0.88    0.90    0.01
C      x        0.16    0.15    0.05
C      y        0.72    0.75    -0.04
D      total    0.33    0.56    -0.41
D      x        0.33    0.56    -0.41

I attempted to use something that I also found on StackOverflow which looked like what I was searching for but I couldn't make it quite right.

def lambda_t(x):
    df = x.sort_values(['owner']).drop(['owner'],axis=1)
    df.loc['total'] = df.sum()
    return df

df.groupby(['owner']).apply(lambda_t)

While in theory this might have been something interesting, the total is not placed where I want and on top of that the values on player name are concatenating, so I end up having a really packed column. This way I end up having a multiindex.

owner       player  val1    val1    val3
A   0        x      5.60    3.18    0.76
    1        y      12.08   15.95   -0.24
    2        z      0.03    0.05    -0.41
    total    xzy    17.71   19.18   0.11
.....

Apparently, dropping level of a multiindex can help but I'm missing the total this way, it disappears.

df.groupby(['owner']).apply(lambda_t).droplevel(level=1)

owner       player  val1    val1    val3
A            x      5.60    3.18    0.76
A            y      12.08   15.95   -0.24
A            z      0.03    0.05    -0.41
A            xzy    17.71   19.18   0.11

Any ideas if this is possible? I've seen that with groupby, assign and loc you can't order them properly.

CodePudding user response:

IIUC, you can use groupby.sum to compute the totals, assign the total name as player, concat the two DataFrames in order and sort_values with a stable method:

out = (pd
  .concat([df.groupby('owner', as_index=False).sum().assign(player='total'),
           df])
   .sort_values(by='owner', kind='stable', ignore_index=True)
   [df.columns]
)

output:

   owner player   val1    val1  val3
0      A  total  17.71   19.18  0.11
1      A      x   5.60    3.18  0.76
2      A      y  12.08   15.95 -0.24
3      A      z   0.03    0.05 -0.41
4      B  total   2.38    2.38  2.06
5      B      x   0.02    0.01  2.06
6      B      z   2.36    2.37  0.00
7      C  total   0.88    0.90  0.01
8      C      x   0.16    0.15  0.05
9      C      y   0.72    0.75 -0.04
10     D  total   0.33    0.56 -0.41
11     D      x   0.33    0.56 -0.41

CodePudding user response:

Another possible solution:

(df.groupby('owner')
 .apply(lambda x:
 pd.concat(
     [pd.concat([pd.DataFrame({'owner': x.owner.unique(), 'player': ['total']}),
      pd.DataFrame(x.iloc[:, 2:].apply(sum, axis=0)).T], axis=1),
      x]
 ))).reset_index(drop=True)

Output:

   owner player   val1   val2  val3
0      A  total  17.71  19.18  0.11
1      A      x   5.60   3.18  0.76
2      A      y  12.08  15.95 -0.24
3      A      z   0.03   0.05 -0.41
4      B  total   2.38   2.38  2.06
5      B      x   0.02   0.01  2.06
6      B      z   2.36   2.37  0.00
7      C  total   0.88   0.90  0.01
8      C      x   0.16   0.15  0.05
9      C      y   0.72   0.75 -0.04
10     D  total   0.33   0.56 -0.41
11     D      x   0.33   0.56 -0.41
  • Related