Home > Net >  Groupby multiple columns & Sum - Create new column with added If Condition
Groupby multiple columns & Sum - Create new column with added If Condition

Time:05-16

I need to groupby multiple columns & then get Sum in a new column with added If condition. I tried the next code and it worked great with grouping by single column:

df['new column'] = (
    df['value'].where(df['value'] > 0).groupby(df['column1']).transform('sum')
)

However, when I try to group by multiple columns I get an error.

df['new_column'] = (
        df['value'].where(df['value'] > 0).groupby(df['column1', 'column2']).transform('sum')
    )

Error:

->return self._engine.get_loc(casted_key) 
The above exception was the direct cause of the following exception: 
->indexer = self.columns.get_loc(key) 
->raise KeyError(key) from err 
->if is_scalar(key) and isna(key) and not self.hasnans: ('column1', 'column2')

Could you please advise how I should change the code to get the same result but grouping by multiple columns?

Thank you

CodePudding user response:

Cause of error

  • The syntax to select multiple columns df['column1', 'column2'] is wrong. This should be df[['column1', 'column2']]
  • Even if you use df[['column1', 'column2']] for groupby, pandas will raise another error complaining that the grouper should be one dimensional. This is because df[['column1', 'column2']] returns a dataframe which is a two dimensional object.

How to fix the error?

Hard way:

Pass each of the grouping columns as one dimensional series to groupby

df['new_column'] = (
        df['value']
          .where(df['value'] > 0)
          .groupby([df['column1'], df['column2']]) # Notice the change
          .transform('sum')
)
Easy way:

First assign the masked column values to the target column, then do groupby transform as you would normally do

df['new_column'] = df['value'].where(df['value'] > 0)
df['new_column'] = df.groupby(['column1', 'column2'])['new_column'].transform('sum')
  • Related