In Pandas, how to compute value counts on bins and sum value in 1 other column-CodePudding

I have Pandas dataframe like:

   df =
         col1        col2

         23          75
         25          78
         22          120

I want to specify bins: 0-100 and 100-200 and divide col2 in those bins compute its value counts and sum col1 for the values that lie in those bins.

 So:
     df_output:
        col2_range          count         col1_cum
         0-100               2              48
         100-200             1              22

Getting the col2_range and count is pretty simple:

         import numpy as np
         a = np.arange(0,200, 100)
         bins = a.tolist()
         counts = data['col1'].value_counts(bins=bins, sort=False)

How do I get to sum col2 though?

CodePudding user response：

IIUC, try using pd.cut to create bins and groupby those bins:

g = pd.cut(df['col2'], 
           bins=[0, 100, 200, 300, 400], 
           labels = ['0-99', '100-199', '200-299', '300-399'])
df.groupby(g, observed=True)['col1'].agg(['count','sum']).reset_index()

Output:

      col2  count  sum
0     0-99      2   48
1  100-199      1   22

I think I misread the original post:

g = pd.cut(df['col2'], 
           bins=[0,100,200,300,400], 
           labels = ['0-99', '100-199', '200-299', '300-399'])
df.groupby(g, observed=True).agg(col1_count=('col1','count'),
                                 col2_sum=('col2','sum'),
                                 col1_sum=('col1','sum')).reset_index()

Output:

      col2  col1_count  col2_sum  col1_sum
0     0-99           2       153        48
1  100-199           1       120        22