How to group values in pandas by quartiles and get metrics of other columns?-CodePudding

I have a pandas DataFrame. I wanna calculate sum and average of 2 another columns based on value condition in 1st column and get result for each group(sum, avg) and number of examples in each group.

 
#    quartile  value1   value2
#    1         1        0.9                   
#    1         1        0.8                   
#    2         1        0.75                   
#    2         0        0.75                   
#    3         0        0.5                  
#    3         0        0.4 
#    3         1        0.3                   
#    4         0        0.1

My desired output should look like:


#    quartile  value1_sum   value2_avg  no_of_instances_in_val1
#    1         2            0.85        2          
#    2         1            0.75        2           
#    3         1            0.40        3          
#    4         0            0.10        1

Anyone can help?

CodePudding user response：

Assuming your data:

data = pd.DataFrame({"quartile" : [1, 1, 2, 2, 3, 3, 3, 4], "value1" : [1, 1, 1, 0, 0, 0, 1, 0], "value2" : [0.9, 0.8, 0.75, 0.75, 0.5, 0.4, 0.3, 0.1]})

You can aggregate it in the following way:

data.groupby("quartile").agg({"value1" : sum, "value2" : mean, "quartile" : len})

You will then simply have to rename the columns as you desire.

CodePudding user response：

To adhere to your desired output better, we can use named aggregation, as follows:

(df.groupby('quartile', as_index=False)
   .agg(value1_sum=('value1', 'sum'), 
        value2_avg=('value2', 'mean'), 
        no_of_instances_in_val1=('value1', 'size'))
)

Output

   quartile  value1_sum  value2_avg  no_of_instances_in_val1
0         1           2        0.85                        2
1         2           1        0.75                        2
2         3           1        0.40                        3
3         4           0        0.10                        1