I am very new to pandas. I have two dataframes related to two player Game
DF1:matches # match information
match_num winner_id loser_id
270 201504 201595
271 201514 201426
272 201697 211901
273 201620 211539
274 214981 203564
DF2: players #information about the players
playerid First Name Last Name Country
200001 Martina Hingis SUI
200002 Mirjana Lucic CRO
200003 Justine Henin BEL
I need to find out Which player(s) won the highest number of matches in a row?
Can this be solved using sql functions?
Thanks in advance!!
CodePudding user response:
You can use Python only to solve this problem with groupby
:
from itertools import groupby
wid, nwin = max([(name, len(list(grp))) for name, grp in groupby(df1['winner_id'])],
key=lambda x: x[1])
Now you can lookup to your second dataframe:
df2.loc[df2['playerid'].eq(wid), ['First Name', 'Last Name']].apply(' '.join, axis=1)
CodePudding user response:
The example that you provided is not really useful as there is no more than one match per id. Here is a better one:
match_num winner_id loser_id
0 270 3 2
1 271 2 3
2 272 3 4
3 273 4 1
4 274 2 4
My proposed pandas solution is a bit lengthy. We first reshape the dataframe to a long format using melt
, and sort by match. Then we compute a group corresponding to consecutive matches of the same type (won/lost). Finally, we keep only the wins, count the group size (i.e. the number of consecutive matches won) and sort.
(df.melt(id_vars='match_num',
var_name='wl',
value_name='ID')
.sort_values(by=['ID', 'match_num'])
.assign(group=lambda d: d.groupby('ID')['wl'].apply(lambda s: s.ne(s.shift()).cumsum()))
.query('wl == "winner_id"')
.groupby(['ID', 'group']).size()
.sort_values(ascending=False)
.droplevel('group')
.rename('consecutive_win')
.to_frame()
)
Output:
consecutive_win
ID
2 2
3 1
3 1
4 1
CodePudding user response:
Something like this would work where the variable b_id is the player with the most wins in a row.
import numpy as np
b_id = None
_max = float('-inf')
for g,g_hold in df.groupby('winner_id'):
diff = np.diff(g_hold.index)
if len(np.where(diff>1)[0]) > 0:
count = len(diff[:np.where(diff>1)[0][0]])
elif len(diff) == 0:
count = len(g_hold)
else:
count = 0
if count > _max:
b_id = g
_max = count