I have a pandas DataFrame. I wanna calculate sum and average of 2 another columns based on value condition in 1st column and get result for each group(sum, avg) and number of examples in each group.
# quartile value1 value2
# 1 1 0.9
# 1 1 0.8
# 2 1 0.75
# 2 0 0.75
# 3 0 0.5
# 3 0 0.4
# 3 1 0.3
# 4 0 0.1
My desired output should look like:
# quartile value1_sum value2_avg no_of_instances_in_val1
# 1 2 0.85 2
# 2 1 0.75 2
# 3 1 0.40 3
# 4 0 0.10 1
Anyone can help?
CodePudding user response:
Assuming your data:
data = pd.DataFrame({"quartile" : [1, 1, 2, 2, 3, 3, 3, 4], "value1" : [1, 1, 1, 0, 0, 0, 1, 0], "value2" : [0.9, 0.8, 0.75, 0.75, 0.5, 0.4, 0.3, 0.1]})
You can aggregate it in the following way:
data.groupby("quartile").agg({"value1" : sum, "value2" : mean, "quartile" : len})
You will then simply have to rename the columns as you desire.
CodePudding user response:
To adhere to your desired output better, we can use named aggregation, as follows:
(df.groupby('quartile', as_index=False)
.agg(value1_sum=('value1', 'sum'),
value2_avg=('value2', 'mean'),
no_of_instances_in_val1=('value1', 'size'))
)
Output
quartile value1_sum value2_avg no_of_instances_in_val1
0 1 2 0.85 2
1 2 1 0.75 2
2 3 1 0.40 3
3 4 0 0.10 1