Home > Enterprise >  New column based on interaction of two related past rows in Pandas dataframe
New column based on interaction of two related past rows in Pandas dataframe

Time:10-03

I have a dataframe of race results of MMA fighters sorted in descending order of date that looks like

Race_ID  Fighter_ID  Date        Result
1        1           2022-05-17  1
1        2           2022-05-17  0
2        1           2022-04-17  0
2        3           2022-04-17  1
3        2           2022-03-11  1
3        1           2022-03-11  0
4        2           2022-02-11  1
4        4           2022-02-11  0
5        3           2022-02-08  1
5        1           2022-02-08  0
6        2           2022-01-11  1
6        4           2022-01-11  0
7        1           2022-01-01  1
7        2           2022-01-01  0

where 1 means win and 0 means lose in that race. I want to add a new column called history that equals 1 if the fighter has actually fought with the opponent in the current race and won the recent past race, and 0 otherwise. Hence the desired outcome looks like

Race_ID  Fighter_ID  Date        Result  history
1        1           2022-05-17  1       0 #Since fighter1 fought with fighter2 in race 3 and lost
1        2           2022-05-17  0       1 #Since fighter2 fought with fighter1 in race 3 and won
2        1           2022-04-17  0       0 #Since fighter1 fought with fighter3 in race 5 and lost
2        3           2022-04-17  1       1 #Since fighter3 fought with fighter1 in race 5 and won
3        2           2022-03-11  1       0 #Since fighter2 fought with fighter1 in race 7 and lost 
3        1           2022-03-11  0       1 #Since fighter1 fought with fighter2 in race 7 and won
4        2           2022-02-11  1       1
4        4           2022-02-11  0       0
5        3           2022-02-08  1       0 #Since there they have not fought before
5        1           2022-02-08  0       0 #Since there they have not fought before
6        2           2022-01-11  1       0 #Since there they have not fought before
6        4           2022-01-11  0       0 #Since there they have not fought before
7        1           2022-01-01  1       0 #Since there they have not fought before
7        2           2022-01-01  0       0 #Since there they have not fought before

CodePudding user response:

Simplier solution is create helper Series by frozenset and used for DataFrameGroupBy.shift:

g = df['Race_ID'].map(df.groupby('Race_ID')['Fighter_ID'].agg(frozenset))
df['History'] = df.groupby(['Fighter_ID', g])['Result'].shift(-1, fill_value=0)

print (df)
    Race_ID  Fighter_ID       Date  Result  History
0         1           1 2022-05-17       1        0
1         1           2 2022-05-17       0        1
2         2           1 2022-04-17       0        0
3         2           3 2022-04-17       1        1
4         3           2 2022-03-11       1        0
5         3           1 2022-03-11       0        1
6         4           2 2022-02-11       1        1
7         4           4 2022-02-11       0        0
8         5           3 2022-02-08       1        0
9         5           1 2022-02-08       0        0
10        6           2 2022-01-11       1        0
11        6           4 2022-01-11       0        0
12        7           1 2022-01-01       1        0
13        7           2 2022-01-01       0        0

Original solution:

Idea is working with pairs - by column Fighter_ID and Result, so first sorting by 3 columns for same ordering in Fighter_ID, create column History by tuples which are shifted per groups by column Fighter_IDand then convert tuples to columns with append History column to original DataFrame (because different ordering of Fighter_ID in original DataFrame):

df1 = (df.sort_values(['Race_ID','Date','Fighter_ID'])
        .groupby('Race_ID')[['Fighter_ID','Result']]
        .agg(tuple)
        .assign(History = lambda x: x.groupby('Fighter_ID')['Result'].shift(-1))
        .dropna(subset=['History'])
        .reset_index()
        .drop('Result', axis=1)
        .apply(pd.Series.explode))

df = df.merge(df1, on=['Race_ID','Fighter_ID'], how='left').fillna({'History':0})
print (df)
    Race_ID Fighter_ID       Date  Result  History
0         1          1 2022-05-17       1        0
1         1          2 2022-05-17       0        1
2         2          1 2022-04-17       0        0
3         2          3 2022-04-17       1        1
4         3          2 2022-03-11       1        0
5         3          1 2022-03-11       0        1
6         4          2 2022-02-11       1        1
7         4          4 2022-02-11       0        0
8         5          3 2022-02-08       1        0
9         5          1 2022-02-08       0        0
10        6          2 2022-01-11       1        0
11        6          4 2022-01-11       0        0
12        7          1 2022-01-01       1        0
13        7          2 2022-01-01       0        0
  • Related