Home > other >  How to filter using > operator and then apply groupby concept to a dataframe
How to filter using > operator and then apply groupby concept to a dataframe

Time:12-15

How can i filter a query and then do a group by

df.query("'result_margin' > 100").groupby(['city','season','toss_winner','toss_decision','winner'])['winner'].size()

I am getting this error

TypeError: '>' not supported between instances of 'str' and 'int'

I am trying to filter where result_margin is greater than 100 then groupby with the columns specified and print records

CodePudding user response:

Using 'result_margin' would treat it as a string, and not refer to the columns.

You would need to remove the quotes:

df.query("result_margin > 100").groupby(['city','season','toss_winner','toss_decision','winner'])['winner'].size()

Or if you might have columns that contain spaces, than add backticks:

df.query("`result_margin` > 100").groupby(['city','season','toss_winner','toss_decision','winner'])['winner'].size()

CodePudding user response:

You need to convert 'result_margin' to int. Try:

df['result_margin'] = df['result_margin'].astype(int)

For the filter, I always create a new dataframe.

df_new = df[df['result_margin']>100].groupby['city','season','toss_winner','toss_decision','winner']).agg(WinnerCount = pd.NamedAgg(column='winner',aggfunc='count'))

I don't use the size method but instead opt for using the agg method and create a new column. You can also try replacing the

agg(WinnerCount = pd.NamedAgg(column='winner',aggfunc='count'))

with

['winner'].size
  • Related