I can't think of how to do this:
As the headline explains I want to group a dataframe by the column acquired_month
only if another column contains Closed Won
(in the example I made a helper column that just marks True
if that condition is fulfilled although I'm not sure that step is necessary). Then if those conditions are met I want to sum the values of a third column but can't think how to do it. Here is my code so far:
us_lead_scoring.loc[us_lead_scoring['Stage'].str.contains('Closed Won'), 'closed_won_binary'] = True acquired_date = us_lead_scoring.groupby('acquired_month')['closed_won_binary'].sum()
but this just sums the true false column not the sum
column if the true false column is true after the acquired_month
groupby. Any direction appreciated.
Thanks
CodePudding user response:
If need aggregate column col
replace non matched values to 0
values in Series.where
and then aggregate sum
:
us_lead_scoring = pd.DataFrame({'Stage':['Closed Won1','Closed Won2','Closed', 'Won'],
'col':[1,3,5,6],
'acquired_month':[1,1,1,2]})
out = (us_lead_scoring['col'].where(us_lead_scoring['Stage']
.str.contains('Closed Won'), 0)
.groupby(us_lead_scoring['acquired_month'])
.sum()
.reset_index(name='SUM'))
print (out)
acquired_month SUM
0 1 4
1 2 0