Home > database >  Calculating mean and count of rows for a bucket
Calculating mean and count of rows for a bucket

Time:03-26

I have a df as follows

MPG    Maker       Price

8     Toyota      20000
12    Toyota      18000
15    Toyota      19000
5     Honda       19000
4     Honda        20000

I am looking to bin by MPG and then calculate average price and number of elements in the bin. The DF I am looking to create is

MPG    Maker    Avg_Price   Num_Sold
0-8    Toyota    19000       3
9-15   Honda     19500       2

I followed the directions in Bucketing in python and calculating mean for a bucket and was able to get the average price, but I am unable to get the Num_Sold to work. I used

bins = [1,8,15]
df_bins = df.MAKER.groupby(pd.cut(df['MPG'],bins))
df_bins = df_bins.agg([np.mean, len]).reset_index(drop=True)

Any ideas on what I might be doing wrong? Thanks!

CodePudding user response:

Use named aggregation also with column Maker:

bins = [1,8,15]
df_bins = (df.groupby(['Maker',pd.cut(df['MPG'],bins)])
             .agg(Avg_Price=('Price','mean'),
                  Num_Sold= ('Price', 'size')).reset_index())

Or without Maker column:

bins = [1,8,15]
df_bins = (df.groupby([pd.cut(df['MPG'],bins)])
             .agg(Avg_Price=('Price','mean'),
                  Num_Sold= ('Price', 'size')).reset_index())
print (df_bins)
  • Related