Home > Enterprise >  Maximum consecutive row for an id using Pandas?
Maximum consecutive row for an id using Pandas?

Time:10-07

I am very new to pandas. I have two dataframes related to two player Game

DF1:matches # match information 

match_num   winner_id   loser_id    
270      201504         201595
271      201514         201426
272      201697         211901
273      201620         211539
274      214981         203564
DF2: players #information about the players

playerid  First Name    Last Name   Country
200001     Martina      Hingis        SUI
200002     Mirjana      Lucic         CRO
200003     Justine      Henin         BEL

I need to find out Which player(s) won the highest number of matches in a row?

Can this be solved using sql functions?

Thanks in advance!!

CodePudding user response:

You can use Python only to solve this problem with groupby:

from itertools import groupby

wid, nwin = max([(name, len(list(grp))) for name, grp in groupby(df1['winner_id'])],
                key=lambda x: x[1])

Now you can lookup to your second dataframe:

df2.loc[df2['playerid'].eq(wid), ['First Name', 'Last Name']].apply(' '.join, axis=1)

CodePudding user response:

The example that you provided is not really useful as there is no more than one match per id. Here is a better one:

   match_num  winner_id  loser_id
0        270          3         2
1        271          2         3
2        272          3         4
3        273          4         1
4        274          2         4

My proposed pandas solution is a bit lengthy. We first reshape the dataframe to a long format using melt, and sort by match. Then we compute a group corresponding to consecutive matches of the same type (won/lost). Finally, we keep only the wins, count the group size (i.e. the number of consecutive matches won) and sort.

(df.melt(id_vars='match_num',
         var_name='wl',
         value_name='ID')
   .sort_values(by=['ID', 'match_num'])
   .assign(group=lambda d: d.groupby('ID')['wl'].apply(lambda s: s.ne(s.shift()).cumsum()))
   .query('wl == "winner_id"')
   .groupby(['ID', 'group']).size()
   .sort_values(ascending=False)
   .droplevel('group')
   .rename('consecutive_win')
   .to_frame()
 )

Output:

    consecutive_win
ID                 
2                 2
3                 1
3                 1
4                 1

CodePudding user response:

Something like this would work where the variable b_id is the player with the most wins in a row.

import numpy as np

b_id = None
_max = float('-inf')
for g,g_hold in df.groupby('winner_id'):
    diff = np.diff(g_hold.index)
    if len(np.where(diff>1)[0]) > 0:
        count = len(diff[:np.where(diff>1)[0][0]])
    elif len(diff) == 0:
        count = len(g_hold)
    else:
        count = 0
    if count > _max:
        b_id = g
        _max = count
  • Related