I have a dataframe of race results of MMA fighters sorted in descending order of date that looks like
Race_ID Fighter_ID Date Result
1 1 2022-05-17 1
1 2 2022-05-17 0
2 1 2022-04-17 0
2 3 2022-04-17 1
3 2 2022-03-11 1
3 1 2022-03-11 0
4 2 2022-02-11 1
4 4 2022-02-11 0
5 3 2022-02-08 1
5 1 2022-02-08 0
6 2 2022-01-11 1
6 4 2022-01-11 0
7 1 2022-01-01 1
7 2 2022-01-01 0
where 1 means win and 0 means lose in that race. I want to add a new column called history
that equals 1 if the fighter has actually fought with the opponent in the current race and won the recent past race, and 0 otherwise. Hence the desired outcome looks like
Race_ID Fighter_ID Date Result history
1 1 2022-05-17 1 0 #Since fighter1 fought with fighter2 in race 3 and lost
1 2 2022-05-17 0 1 #Since fighter2 fought with fighter1 in race 3 and won
2 1 2022-04-17 0 0 #Since fighter1 fought with fighter3 in race 5 and lost
2 3 2022-04-17 1 1 #Since fighter3 fought with fighter1 in race 5 and won
3 2 2022-03-11 1 0 #Since fighter2 fought with fighter1 in race 7 and lost
3 1 2022-03-11 0 1 #Since fighter1 fought with fighter2 in race 7 and won
4 2 2022-02-11 1 1
4 4 2022-02-11 0 0
5 3 2022-02-08 1 0 #Since there they have not fought before
5 1 2022-02-08 0 0 #Since there they have not fought before
6 2 2022-01-11 1 0 #Since there they have not fought before
6 4 2022-01-11 0 0 #Since there they have not fought before
7 1 2022-01-01 1 0 #Since there they have not fought before
7 2 2022-01-01 0 0 #Since there they have not fought before
CodePudding user response:
Simplier solution is create helper Series by frozenset
and used for DataFrameGroupBy.shift
:
g = df['Race_ID'].map(df.groupby('Race_ID')['Fighter_ID'].agg(frozenset))
df['History'] = df.groupby(['Fighter_ID', g])['Result'].shift(-1, fill_value=0)
print (df)
Race_ID Fighter_ID Date Result History
0 1 1 2022-05-17 1 0
1 1 2 2022-05-17 0 1
2 2 1 2022-04-17 0 0
3 2 3 2022-04-17 1 1
4 3 2 2022-03-11 1 0
5 3 1 2022-03-11 0 1
6 4 2 2022-02-11 1 1
7 4 4 2022-02-11 0 0
8 5 3 2022-02-08 1 0
9 5 1 2022-02-08 0 0
10 6 2 2022-01-11 1 0
11 6 4 2022-01-11 0 0
12 7 1 2022-01-01 1 0
13 7 2 2022-01-01 0 0
Original solution:
Idea is working with pairs - by column Fighter_ID
and Result
, so first sorting by 3 columns for same ordering in Fighter_ID
, create column History
by tuples which are shifted per groups by column Fighter_ID
and then convert tuples to columns with append History
column to original DataFrame (because different ordering of Fighter_ID
in original DataFrame):
df1 = (df.sort_values(['Race_ID','Date','Fighter_ID'])
.groupby('Race_ID')[['Fighter_ID','Result']]
.agg(tuple)
.assign(History = lambda x: x.groupby('Fighter_ID')['Result'].shift(-1))
.dropna(subset=['History'])
.reset_index()
.drop('Result', axis=1)
.apply(pd.Series.explode))
df = df.merge(df1, on=['Race_ID','Fighter_ID'], how='left').fillna({'History':0})
print (df)
Race_ID Fighter_ID Date Result History
0 1 1 2022-05-17 1 0
1 1 2 2022-05-17 0 1
2 2 1 2022-04-17 0 0
3 2 3 2022-04-17 1 1
4 3 2 2022-03-11 1 0
5 3 1 2022-03-11 0 1
6 4 2 2022-02-11 1 1
7 4 4 2022-02-11 0 0
8 5 3 2022-02-08 1 0
9 5 1 2022-02-08 0 0
10 6 2 2022-01-11 1 0
11 6 4 2022-01-11 0 0
12 7 1 2022-01-01 1 0
13 7 2 2022-01-01 0 0