I need to groupby multiple columns & then get Sum in a new column with added If condition. I tried the next code and it worked great with grouping by single column:
df['new column'] = (
df['value'].where(df['value'] > 0).groupby(df['column1']).transform('sum')
)
However, when I try to group by multiple columns I get an error.
df['new_column'] = (
df['value'].where(df['value'] > 0).groupby(df['column1', 'column2']).transform('sum')
)
Error:
->return self._engine.get_loc(casted_key)
The above exception was the direct cause of the following exception:
->indexer = self.columns.get_loc(key)
->raise KeyError(key) from err
->if is_scalar(key) and isna(key) and not self.hasnans: ('column1', 'column2')
Could you please advise how I should change the code to get the same result but grouping by multiple columns?
Thank you
CodePudding user response:
Cause of error
- The syntax to select multiple columns
df['column1', 'column2']
is wrong. This should bedf[['column1', 'column2']]
- Even if you use
df[['column1', 'column2']]
forgroupby
, pandas will raise another error complaining that the grouper should beone dimensional
. This is becausedf[['column1', 'column2']]
returns a dataframe which is a two dimensional object.
How to fix the error?
Hard way:
Pass each of the grouping columns as one dimensional series to groupby
df['new_column'] = (
df['value']
.where(df['value'] > 0)
.groupby([df['column1'], df['column2']]) # Notice the change
.transform('sum')
)
Easy way:
First assign the masked column values to the target column, then do groupby
transform
as you would normally do
df['new_column'] = df['value'].where(df['value'] > 0)
df['new_column'] = df.groupby(['column1', 'column2'])['new_column'].transform('sum')