I have a dataframe :
Page_ID Volume Conversion KPI OSBR
A 100 10 0.7 (10,12)
A 150 11 0.2 (10,12)
B 100 11 0.4 (11,16)
I would like to goupe all the Page_ID by OSBR by counting the sum of Volume and Conversion , and the KPI should be equal the sum of Conversion devided by Conversion.
The expected result should be :
Page_ID Volume Conversion KPI OSBR
A 250 21 0.084(21/250) (10,12)
B 100 11 0.110(11/100) (11,16)
I tryed with this code :
subdata1=df.groupby(["PageId", "OSBrowser"]).sum().reset_index()
But the result for KPI is uncorrect cause it counted the sum .
Any idea please to solve it ? thanks
CodePudding user response:
If I understand correctly:
x = df.groupby(['Page_ID', 'OSBR']).agg({'Volume': 'sum', 'Conversion':'sum'})
x['KPI'] = x['Conversion'] / x['Volume']
x = x.reset_index()
Output:
>>> x
Page_ID OSBR Volume Conversion KPI
0 A (10,12) 250 21 0.084
1 B (11,16) 100 11 0.110
CodePudding user response:
This would be my approach assuming the final KPI is sum(Conversion)/sum(Volume) per group:
(df
.groupby(["Page_ID", "OSBR"], as_index=False).sum()
.assign(KPI=lambda d: d['Conversion']/d['Volume'])
[df.columns]
)
or with eval
if the variable names are valid python words:
(df
.groupby(["Page_ID", "OSBR"], as_index=False).sum()
.eval('KPI=Conversion/Volume')
[df.columns]
)
output:
Page_ID Volume Conversion KPI OSBR
0 A 250 21 0.084 (10,12)
1 B 100 11 0.110 (11,16)