Home > Mobile >  Groupby, sum, reset index & keep first all together
Groupby, sum, reset index & keep first all together

Time:05-17

I am using the following code and my goal is to group by 2 columns (out of tens of them), then keep the first value of all the other columns while summing the values of two other columns. And it doesn't really work no matter the combination that I tried.

Code used:

df1 = df.groupby(['col_1', 'Col_2'], as_index = False)[['Age', 'Income']].apply(sum).first()

The error that I am getting is the following which just leads me to believe that this can be done with a slightly different version of the code that I used.

TypeError: first() missing 1 required positional argument: 'offset'

Any suggestions would be more than appreciated!

CodePudding user response:

You can use agg with configuring corresponding functions for each column.

group = ['col_1', 'col_2']
(df.groupby(group, as_index=False)
 .agg({
    **{x: 'first' for x in df.columns[~df.columns.isin(group)]}, # for all columns other than grouping column
    **{'Age': 'sum', 'Income': 'sum'} # Overwrite aggregation for specific columns
 })
)

This part { **{...}, **{...} } will generate

{
   'Age': 'sum',
   'Income': 'sum',
   'othercol': 'first',
   'morecol': 'first'
}
  • Related