How to find common values in groupby groups?-CodePudding

I have a df of this format, my goal is to find users who participate in more than one tournament and ultimately set their 'val' value to the one they first appear with. Initially, I was thinking I need to groupby 'tour' but then it needs some intersection but I'm not sure how to proceed. Alternatively, I can do pd.crosstab(df.user, df.tour) but I'm not sure how to proceed either.

df = pd.DataFrame(data = [['jim','1','1', 10],['john','1','1', 12], ['jack','2', '1', 14],['jim','2','1', 10],
                           ['mel','3','2', 20],['jim','3','2', 10],['mat','4','2', 14],['nick','4','2', 20],
                          ['tim','5','3', 16],['john','5','3', 10],['lin','6','3', 16],['mick','6','3', 20]],
                   columns = ['user', 'game', 'tour', 'val'])

CodePudding user response：

You can groupby on 'user' and filter out groups with only 1 element, and then select the first one, like so:

df.groupby(['user']).filter(lambda g:len(g)>1).groupby('user').head(1)

output


user       game tour    val
0   jim     1   1       10
1   john    1   1       12

CodePudding user response：

Since df is already sorted by tour, we could use groupby first:

df['val'] = df.groupby('user')['val'].transform('first')

Output:

    user game tour  val
0    jim    1    1   10
1   john    1    1   12
2   jack    2    1   14
3    jim    2    1   10
4    mel    3    2   20
5    jim    3    2   10
6    mat    4    2   14
7   nick    4    2   20
8    tim    5    3   16
9   john    5    3   12
10   lin    6    3   16
11  mick    6    3   20