Home > Back-end >  Computed column with group by in a dataframe
Computed column with group by in a dataframe

Time:03-19

I have a dataframe :

Page_ID   Volume   Conversion   KPI     OSBR
A          100       10         0.7    (10,12)
A          150       11         0.2    (10,12)
B          100       11         0.4    (11,16)

I would like to goupe all the Page_ID by OSBR by counting the sum of Volume and Conversion , and the KPI should be equal the sum of Conversion devided by Conversion.

The expected result should be :

Page_ID   Volume   Conversion   KPI               OSBR
A          250       21         0.084(21/250)     (10,12)
B          100       11         0.110(11/100)     (11,16)

I tryed with this code :

subdata1=df.groupby(["PageId", "OSBrowser"]).sum().reset_index()

But the result for KPI is uncorrect cause it counted the sum .

Any idea please to solve it ? thanks

CodePudding user response:

If I understand correctly:

x = df.groupby(['Page_ID', 'OSBR']).agg({'Volume': 'sum', 'Conversion':'sum'})
x['KPI'] = x['Conversion'] / x['Volume']
x = x.reset_index()

Output:

>>> x
  Page_ID     OSBR  Volume  Conversion    KPI
0       A  (10,12)     250          21  0.084
1       B  (11,16)     100          11  0.110

CodePudding user response:

This would be my approach assuming the final KPI is sum(Conversion)/sum(Volume) per group:

(df
 .groupby(["Page_ID", "OSBR"], as_index=False).sum()
 .assign(KPI=lambda d: d['Conversion']/d['Volume'])
 [df.columns]
)

or with eval if the variable names are valid python words:

(df
 .groupby(["Page_ID", "OSBR"], as_index=False).sum()
 .eval('KPI=Conversion/Volume')
 [df.columns]
)

output:

  Page_ID  Volume  Conversion    KPI     OSBR
0       A     250          21  0.084  (10,12)
1       B     100          11  0.110  (11,16)
  • Related