I have a dataframe subdata:
PageId VolumePred ConversionPred OSBrowser
1005581 8.0 7.0 (11, 16)
1016529 175.0 85.0 (11, 16)
1016529 4.0 4.0 (11, 17)
1016529 4.0 4.0 (12, 14)
1016529 29.0 19.0 (12, 16)
For each OSBrowser I select only "PageId" that the cumulated value of "VolumePred" is less than 44.
Here the code that works correctly :
h = subdata.groupby(["PageId", "OSBrowser"]).sum().reset_index().sort_values(["OSBrowser","ConversionPred"],ascending=[True,False])
output = h[h.groupby("OSBrowser")["VolumePred"].transform("cumsum").lt(44)].reset_index(drop=True)
But now I would like to change the 44 value by the 50% of the value of (sum of VolumePred for each "OSBrowser")
for example the expected output for subdata is an empty dataframe because for (11, 16) OSBrowser : the (11, 16) of two records are bigger than (0.5* (175 8)).. same thing for the others rows
Any idea please? thanks
CodePudding user response:
or like this?
h = subdata.groupby(["PageId", "OSBrowser"]).sum()\
.reset_index()\
.sort_values(['OSBrowser','ConversionPred'],ascending=[True,False])
h['VolumePred_total'] = h.groupby("OSBrowser")["VolumePred"].transform("sum")
output = h[h.groupby("OSBrowser")["VolumePred"].transform("cumsum").lt(h['VolumePred_total']*0.5)].reset_index(drop=True)
CodePudding user response:
h = subdata.groupby(["PageId", "OSBrowser"]).sum()\
.reset_index()\
.sort_values(['OSBrowser','ConversionPred'],ascending=[True,False])
VolumePred_total = h.groupby('OSBrowser').sum()["VolumePred"].rename('VolumePred_total')
h1 = h.merge(VolumePred_total,on='OSBrowser')
output = h1[h1.groupby("OSBrowser")["VolumePred"].transform("cumsum")\
.lt(h1["VolumePred_total"]*0.5)].reset_index(drop=True)