I have a df of gambling transactions (small sample below). I want to be able to group the transactions of each unique user, but also to determine each user's success.
import pandas as pd
d = {'user_id': [1234, 5830, 3943, 1234, 5032, 5830,1234 ], 'win': [1, 0, 1, 0, 0, 1, 1],}
df= pd.DataFrame(data=d)
df
I can group the user id's and count to see how many times each occurs, but not do the second part- determine success for each user, would like to see win/loss in real terms and as a ratio for each user.
group = df.groupby('user_id')['user_id'].count()
print(group)
So my output would a a df with these columns 'User', 'Total_wins', 'Total_losses','win_loss_ratio'
CodePudding user response:
You could use groupby
agg
where you pass in count
and mean
:
out = df.groupby('user_id', as_index=False).agg(count=('win','count'), win_ratio=('win','mean'))
Output:
user_id count win_ratio
0 1234 3 0.666667
1 3943 1 1.000000
2 5032 1 0.000000
3 5830 2 0.500000
CodePudding user response:
Use GroupBy.agg
for count 1
with sum
, for count 0
lambda function and for ratio
use mean
:
df=(df.groupby('user_id',as_index=False).agg(Total_wins=('win','sum'),
Total_losses=('win', lambda x: x.eq(0).sum()),
win_loss_ratio=('win','mean')))
print (df)
user_id Total_wins Total_losses win_loss_ratio
0 1234 2 1 0.666667
1 3943 1 0 1.000000
2 5032 0 1 0.000000
3 5830 1 1 0.500000