How can i filter a query and then do a group by
df.query("'result_margin' > 100").groupby(['city','season','toss_winner','toss_decision','winner'])['winner'].size()
I am getting this error
TypeError: '>' not supported between instances of 'str' and 'int'
I am trying to filter where result_margin is greater than 100 then groupby with the columns specified and print records
CodePudding user response:
Using 'result_margin'
would treat it as a string, and not refer to the columns.
You would need to remove the quotes:
df.query("result_margin > 100").groupby(['city','season','toss_winner','toss_decision','winner'])['winner'].size()
Or if you might have columns that contain spaces, than add backticks:
df.query("`result_margin` > 100").groupby(['city','season','toss_winner','toss_decision','winner'])['winner'].size()
CodePudding user response:
You need to convert 'result_margin' to int. Try:
df['result_margin'] = df['result_margin'].astype(int)
For the filter, I always create a new dataframe.
df_new = df[df['result_margin']>100].groupby['city','season','toss_winner','toss_decision','winner']).agg(WinnerCount = pd.NamedAgg(column='winner',aggfunc='count'))
I don't use the size method but instead opt for using the agg method and create a new column. You can also try replacing the
agg(WinnerCount = pd.NamedAgg(column='winner',aggfunc='count'))
with
['winner'].size