Home > Software design >  Comparing rows in pandas dataframe where other columns are common
Comparing rows in pandas dataframe where other columns are common

Time:10-29

I have the following dataframe of athletes competing in races.

   pos   country       year    round      event       athlete    Battle
   1      Jamaica      2012    100m      Beijing     BOLT         0
   2      Jamaica      2012    100m      Beijing     BLAKE        0
   4       USA         2012    100m      Beijing     GAY          0
   3       USA         2012    100m      Beijing     JOHN         0
   6      Jamaica      2012    200m      Beijing     BOLT         0
   7      Jamaica      2012    200m      Beijing     BLAKE        0
   3       USA         2012    200m      Beijing      GAY         0
   4       USA         2012    200m      Beijing     JOHN         0

I would like to assign values to the 'Battle' column that denotes an athlete got a higher grid spot. Something like this:

   pos   country      year    round     event       athlete    Battle
   1      Jamaica     2012     100m      Beijing    BOLT         1
   2       Jamaica     2012    100m      Beijing    BLAKE        0
   4       USA         2012    100m      Beijing    GAY          0
   3       USA         2012    100m      Beijing    JOHN         1
   6       Jamaica     2012    200m      Beijing    BOLT         2
   7       Jamaica      2012   200m      Beijing    BLAKE        0
   3       USA         2012    200m      Beijing    GAY          1
   4       USA         2012    200m      Beijing    JOHN         1

Bolt has a better position than Blake in both races so his battle value after 2 rounds will be 2 compared to Blake's zero. Gay and John performed better in one of the 2 races, so they have a value of 1 and 1.

How would one compute the Battle column in pandas?? I am not sure where to begin. I have looked at other questions which include my own e.g. Assigning value to pandas dataframe values for unique values in another column and Compare values in two different pandas columns but these comparisons and/or operations span over two columns, whereas my desired result forces me to consider multiple columns (year, round, event, country) which must be common, and find unique values in the athlete column and compare.

I have tried to isolate athletes by country using df.groupby(['country', 'round', 'pos']).apply(lambda x: x['athlete'])

  country          round    pos      athlete    
   Jamaica         100m      1        BOLT         
                             2        BLAKE   
   USA             100m      4        GAY         
                             3        JOHN     
   Jamaica         200m      6        BOLT         
                             7        BLAKE       
   USA             200m      3        GAY         
                             4        JOHN 

While this isolates the important info, I am still confused on how to compare their positions. Any help would be appreciated.

CodePudding user response:

IIUC, you can:

  1. groupby country and round, and rank the athletes to see how many other athletes they outperformed.
  2. groupby athlete and sum the ranks.
df["rank"] = df.groupby(["country", "round"])["pos"].transform(pd.Series.rank, ascending=False, method="dense")-1
df["Battle"] = df.groupby("athlete")["rank"].transform(sum)

>>> df.drop("rank", axis=1)

   pos  country  year round    event athlete  Battle
0    1  Jamaica  2012  100m  Beijing    BOLT     2.0
1    2  Jamaica  2012  100m  Beijing   BLAKE     0.0
2    4      USA  2012  100m  Beijing     GAY     1.0
3    3      USA  2012  100m  Beijing    JOHN     1.0
4    6  Jamaica  2012  200m  Beijing    BOLT     2.0
5    7  Jamaica  2012  200m  Beijing   BLAKE     0.0
6    3      USA  2012  200m  Beijing     GAY     1.0
7    4      USA  2012  200m  Beijing    JOHN     1.0
​~~~
  • Related