I have dataset for matches and home team result at the current match
match_date home away home_result
2021-11-22 team1 team2 Win
2021-11-22 team3 team4 Win
2021-11-23 team1 team8 Lose
2021-11-23 team6 team7 Win
2021-11-25 team1 team2 Win
2021-11-25 team3 team8 Lose
2021-11-25 team1 team5 Lose
2021-11-25 team6 team5 Win
2021-11-28 team3 team1 Lose
2021-11-29 team1 team5 Win
2021-11-29 team6 team9 Win
I want to create new column, where I can put previous result for each home team before the current match for example team1 played at 2021-11-22 (no previous matches) and 2021-11-23 (previous match team1 Win) and 2021-11-25 (previous matches team1 Win, Lose) and 2021-11-29 (previous matches team1 Win, Lose, Lose) This is the expected column:
match_date home away home_result home_team_previous_results
2021-11-22 team1 team2 Win NaN
2021-11-22 team3 team4 Win NaN
2021-11-23 team1 team8 Lose [("Win","2021-11-22")]
2021-11-23 team6 team7 Win NaN
2021-11-25 team1 team2 Win [("Win","2021-11-22"), ("Lose","2021-11-23")]
2021-11-25 team3 team8 Lose [("Win","2021-11-22")]
2021-11-25 team1 team5 Lose [("Win","2021-11-22"), ("Lose","2021-11-23"), ("Win","2021-11-25")]
2021-11-25 team6 team5 Win [("Win","2021-11-23")]
2021-11-28 team3 team1 Lose [("Win","2021-11-22"), ("Lose","2021-11-25")]
2021-11-29 team1 team5 Win [("Win","2021-11-22"), ("Lose","2021-11-23"), ("Win","2021-11-25"), ("Lose","2021-11-25")]
2021-11-29 team6 team9 Win [("Win","2021-11-23"), ("Win","2021-11-25")]
CodePudding user response:
A gnarly solution:
df['home_team_previous_results'] = (
df.groupby('home')
.apply(
lambda x: pd.Series(
[
[
tuple([row[col] for col in ['home_result', 'match_date']])
for _, row in x.iloc[0:i].iterrows()
] or np.nan
for i in range(len(x))
],
index=x.index)
).droplevel(0)
)
Output:
>>> df
match_date home away home_result home_team_previous_results
0 2021-11-22 team1 team2 Win NaN
1 2021-11-22 team3 team4 Win NaN
2 2021-11-23 team1 team8 Lose [(Win, 2021-11-22)]
3 2021-11-23 team6 team7 Win NaN
4 2021-11-25 team1 team2 Win [(Win, 2021-11-22), (Lose, 2021-11-23)]
5 2021-11-25 team3 team8 Lose [(Win, 2021-11-22)]
6 2021-11-25 team1 team5 Lose [(Win, 2021-11-22), (Lose, 2021-11-23), (Win, ...
7 2021-11-25 team6 team5 Win [(Win, 2021-11-23)]
8 2021-11-28 team3 team1 Lose [(Win, 2021-11-22), (Lose, 2021-11-25)]
9 2021-11-29 team1 team5 Win [(Win, 2021-11-22), (Lose, 2021-11-23), (Win, ...
10 2021-11-29 team6 team9 Win [(Win, 2021-11-23), (Win, 2021-11-25)]
One-liner version:
df['home_team_previous_results'] = df.groupby('home').apply(lambda x: pd.Series([[tuple([row[col] for col in ['home_result', 'match_date']]) for _, row in x.iloc[0:i].iterrows()] or np.nan for i in range(len(x))], index=x.index)).droplevel(0)
CodePudding user response:
Unfortunately, I don't believe Pandas supports an efficient solution.
assert isinstance(df.index, pd.RangeIndex) # This solution assumes a RangeIndex
df['home_team_previous_results'] = pd.Series(dtype=object)
team_frames = dict(list(df.groupby('home')))
for i, row in df.iterrows():
previous = team_frames[row['home']].loc[:i-1, ['home_result', 'match_date']]
records = list(previous.to_records(index=False)) or float('nan')
df.at[i, 'home_team_previous_results'] = records