I have a df
of this format, my goal is to find users who participate in more than one tournament and ultimately set their 'val' value to the one they first appear with. Initially, I was thinking I need to groupby
'tour' but then it needs some intersection but I'm not sure how to proceed. Alternatively, I can do pd.crosstab(df.user, df.tour)
but I'm not sure how to proceed either.
df = pd.DataFrame(data = [['jim','1','1', 10],['john','1','1', 12], ['jack','2', '1', 14],['jim','2','1', 10],
['mel','3','2', 20],['jim','3','2', 10],['mat','4','2', 14],['nick','4','2', 20],
['tim','5','3', 16],['john','5','3', 10],['lin','6','3', 16],['mick','6','3', 20]],
columns = ['user', 'game', 'tour', 'val'])
CodePudding user response:
You can groupby
on 'user' and filter out groups with only 1 element, and then select the first one, like so:
df.groupby(['user']).filter(lambda g:len(g)>1).groupby('user').head(1)
output
user game tour val
0 jim 1 1 10
1 john 1 1 12
CodePudding user response:
Since df
is already sorted by tour
, we could use groupby
first
:
df['val'] = df.groupby('user')['val'].transform('first')
Output:
user game tour val
0 jim 1 1 10
1 john 1 1 12
2 jack 2 1 14
3 jim 2 1 10
4 mel 3 2 20
5 jim 3 2 10
6 mat 4 2 14
7 nick 4 2 20
8 tim 5 3 16
9 john 5 3 12
10 lin 6 3 16
11 mick 6 3 20