Pandas, grouping by unique user and profiling result-CodePudding

I have a df of gambling transactions (small sample below). I want to be able to group the transactions of each unique user, but also to determine each user's success.

import pandas as pd
    
d = {'user_id': [1234, 5830, 3943, 1234, 5032, 5830,1234 ], 'win': [1, 0, 1, 0, 0, 1, 1],}
    
df= pd.DataFrame(data=d)
df

I can group the user id's and count to see how many times each occurs, but not do the second part- determine success for each user, would like to see win/loss in real terms and as a ratio for each user.

group =  df.groupby('user_id')['user_id'].count()
print(group)

So my output would a a df with these columns 'User', 'Total_wins', 'Total_losses','win_loss_ratio'

CodePudding user response：

You could use groupby agg where you pass in count and mean:

out = df.groupby('user_id', as_index=False).agg(count=('win','count'), win_ratio=('win','mean'))

Output:

   user_id  count  win_ratio
0     1234      3   0.666667
1     3943      1   1.000000
2     5032      1   0.000000
3     5830      2   0.500000

CodePudding user response：

Use GroupBy.agg for count 1 with sum, for count 0 lambda function and for ratio use mean:

df=(df.groupby('user_id',as_index=False).agg(Total_wins=('win','sum'),
                                             Total_losses=('win', lambda x: x.eq(0).sum()),
                                             win_loss_ratio=('win','mean')))
print (df)
   user_id  Total_wins  Total_losses  win_loss_ratio
0     1234           2             1        0.666667
1     3943           1             0        1.000000
2     5032           0             1        0.000000
3     5830           1             1        0.500000