Home > Net >  Sum of columns based on grouping
Sum of columns based on grouping

Time:11-23

d = {'a': [0,1,0,1,1,1], 'b': [1,1,1,1,1,1], 'c': [0,0,0,0,0,1], 'd': [1,0,1,1,1,0]}
dct = {'a':'top','b':'bot','c':'top','d':'bot'}
df = pd.DataFrame(d)

So I have a df and a dictionary that assigns each column to a group. Having trouble creating a function that creates a total(sum) column for the group. This would be the output of the first 2 rows:

a b c d total_top total_bot
0 1 0 1 0 2
1 1 0 0 1 1

and so on.... Ideally taking the dataframe and name of the group as inputs

CodePudding user response:

Try the following -

sums = df.rename(columns=dct).stack().groupby(level=(0,1)).sum().unstack()
df_new = df.join(sums)
df_new
   a  b  c  d  bot  top
0  0  1  0  1    2    0
1  1  1  0  0    1    1
2  0  1  0  1    2    0
3  1  1  0  1    2    1
4  1  1  0  1    2    1
5  1  1  1  0    1    2

Explanation -

  1. df.rename(columns=dct) renames the column names to top and bot based on the dictionary dct
df.rename(columns=dct)

   top  bot  top  bot
0    0    1    0    1
1    1    1    0    0
2    0    1    0    1
3    1    1    0    1
4    1    1    0    1
5    1    1    1    0
  1. Then stack() stacks the rows so you have a single column with 1s and 0s and multiple indexes.
df.rename(columns=dct).stack()

0  top    0
   bot    1
   top    0
   bot    1
1  top    1
   bot    1
   top    0
   bot    0
2  top    0
   bot    1
   top    0
   bot    1
3  top    1
   bot    1
   top    0
   bot    1
4  top    1
   bot    1
   top    0
   bot    1
5  top    1
   bot    1
   top    1
   bot    0
dtype: int64
  1. Groupby with sum() groups on level 0 (row indexes) and level 1 (column with top and bot values) and sums up the 1s
df.rename(columns=dct).stack().groupby(level=(0,1)).sum()

0  bot    2
   top    0
1  bot    1
   top    1
2  bot    2
   top    0
3  bot    2
   top    1
4  bot    2
   top    1
5  bot    1
   top    2
dtype: int64
  1. Unstack to get back the 2 columns top and bot
df.rename(columns=dct).stack().groupby(level=(0,1)).sum().unstack()

   bot  top
0    2    0
1    1    1
2    2    0
3    2    1
4    2    1
5    1    2
  1. Last, just join the original df with this new sums dataframe over indexes.
df.join(sums)

   a  b  c  d  bot  top
0  0  1  0  1    2    0
1  1  1  0  0    1    1
2  0  1  0  1    2    0
3  1  1  0  1    2    1
4  1  1  0  1    2    1
5  1  1  1  0    1    2

CodePudding user response:

  1. map the columns names to groups
  2. groupby the resulting values along axis=1 and sum
  3. add_prefix to columns for naming
  4. join to df
>>> df.join(df.groupby(df.columns.map(dct), axis=1).sum().add_prefix("total_"))

   a  b  c  d  total_bot  total_top
0  0  1  0  1          2          0
1  1  1  0  0          1          1
2  0  1  0  1          2          0
3  1  1  0  1          2          1
4  1  1  0  1          2          1
5  1  1  1  0          1          2

CodePudding user response:

Group by the dct and join back:

grouped = df.groupby(dct, axis = 1).sum().add_prefix('total_')

pd.concat([df, grouped], axis = 1)
 
   a  b  c  d  total_bot  total_top
0  0  1  0  1          2          0
1  1  1  0  0          1          1
2  0  1  0  1          2          0
3  1  1  0  1          2          1
4  1  1  0  1          2          1
5  1  1  1  0          1          2
  • Related