Home > Enterprise >  select rows that value is less than 50% of sum grouped value
select rows that value is less than 50% of sum grouped value

Time:03-13

I have a dataframe subdata:

PageId  VolumePred  ConversionPred  OSBrowser
1005581    8.0            7.0       (11, 16)
1016529  175.0           85.0       (11, 16)
1016529    4.0            4.0       (11, 17)
1016529    4.0            4.0       (12, 14)
1016529   29.0           19.0       (12, 16)

For each OSBrowser I select only "PageId" that the cumulated value of "VolumePred" is less than 44.

Here the code that works correctly :

h = subdata.groupby(["PageId", "OSBrowser"]).sum().reset_index().sort_values(["OSBrowser","ConversionPred"],ascending=[True,False])
output = h[h.groupby("OSBrowser")["VolumePred"].transform("cumsum").lt(44)].reset_index(drop=True)

But now I would like to change the 44 value by the 50% of the value of (sum of VolumePred for each "OSBrowser")

for example the expected output for subdata is an empty dataframe because for (11, 16) OSBrowser : the (11, 16) of two records are bigger than (0.5* (175 8)).. same thing for the others rows

Any idea please? thanks

CodePudding user response:

or like this?

h = subdata.groupby(["PageId", "OSBrowser"]).sum()\
    .reset_index()\
    .sort_values(['OSBrowser','ConversionPred'],ascending=[True,False])
h['VolumePred_total'] = h.groupby("OSBrowser")["VolumePred"].transform("sum")
output = h[h.groupby("OSBrowser")["VolumePred"].transform("cumsum").lt(h['VolumePred_total']*0.5)].reset_index(drop=True)

CodePudding user response:

h = subdata.groupby(["PageId", "OSBrowser"]).sum()\
    .reset_index()\
    .sort_values(['OSBrowser','ConversionPred'],ascending=[True,False])
VolumePred_total = h.groupby('OSBrowser').sum()["VolumePred"].rename('VolumePred_total')
h1 = h.merge(VolumePred_total,on='OSBrowser')
output = h1[h1.groupby("OSBrowser")["VolumePred"].transform("cumsum")\
    .lt(h1["VolumePred_total"]*0.5)].reset_index(drop=True)
  • Related