I am looking for a way to sum up the values > or < a certain threshold in a given column (here > 6 in days_install_to_event column).
I tried many different ways, such a loc, query or groupby, but it return only the values > 6 not the ones < 6.
Here some of the things I have tried:
df = pd.DataFrame({
'custom_action' : ['First_puchase', 'First_puchase', 'First_puchase', 'First_puchase',
'First_puchase', 'First_puchase', 'First_puchase', 'First_puchase'],
'days_install_to_event' : [1, 2, 3, 4, 5, 6, 7, 8],
'number_unique_users' : [1350, 250, 13, 2, 1, 2, 1, 2]})
df
custom_action days_install_to_event number_unique_users
0 First_puchase 1 1350
1 First_puchase 2 250
2 First_puchase 3 13
3 First_puchase 4 2
4 First_puchase 5 1
5 First_puchase 6 2
6 First_puchase 7 1
7 First_puchase 8 2
8 First_puchase 9 3
9 First_puchase 10 2
df_1 = df.loc[df['days_install_to_event'] > 6].sum()
df_2 = df.query("days_install_to_event > 6")['number_unique_users'].sum()
df_1
df_2
Output:
custom_action First_puchaseFirst_puchase
days_install_to_event 34
number_unique_users 8
8
Desired output:
custom_action days_install_to_event number_unique_users
0 First_puchase 1 1350
1 First_puchase 2 250
2 First_puchase 3 13
3 First_puchase 4 2
4 First_puchase 5 1
5 First_puchase 6 2
6 First_puchase 7 8
In advance, sorry if a very similar question have been asked, I have been looking around for past 2 days but found nothing that could match exactly what I was looking for. It may be due to formulation.
Thanks for your help :)
CodePudding user response:
As far as I know there is no out-of-the-box solution for this but you can get this result by creating a helper grouper column:
# Set days_install_to_event = 7 if the value is larger than 6
grouper = df['days_install_to_event'].mask(df['days_install_to_event'] > 6, '7 ')
Then, with the help of this column, you can use groupby.agg
:
In [27]: df.groupby(grouper).agg({
'number_unique_users': 'sum',
'custom_action': 'first',
}).reset_index()
Out[27]:
days_install_to_event number_unique_users custom_action
0 1 1350 First_puchase
1 2 250 First_puchase
2 3 13 First_puchase
3 4 2 First_puchase
4 5 1 First_puchase
5 6 2 First_puchase
6 7 8 First_puchase