Home > Back-end >  create new column for each row related previous matches
create new column for each row related previous matches

Time:12-25

I have dataset for matches and home team result at the current match

match_date  home    away    home_result   
2021-11-22  team1   team2   Win
2021-11-22  team3   team4   Win 
2021-11-23  team1   team8   Lose
2021-11-23  team6   team7   Win
2021-11-25  team1   team2   Win 
2021-11-25  team3   team8   Lose 
2021-11-25  team1   team5   Lose 
2021-11-25  team6   team5   Win 
2021-11-28  team3   team1   Lose 
2021-11-29  team1   team5   Win 
2021-11-29  team6   team9   Win 

I want to create new column, where I can put previous result for each home team before the current match for example team1 played at 2021-11-22 (no previous matches) and 2021-11-23 (previous match team1 Win) and 2021-11-25 (previous matches team1 Win, Lose) and 2021-11-29 (previous matches team1 Win, Lose, Lose) This is the expected column:

match_date  home    away    home_result   home_team_previous_results
2021-11-22  team1   team2   Win           NaN
2021-11-22  team3   team4   Win           NaN
2021-11-23  team1   team8   Lose          [("Win","2021-11-22")]  
2021-11-23  team6   team7   Win           NaN 
2021-11-25  team1   team2   Win           [("Win","2021-11-22"), ("Lose","2021-11-23")]
2021-11-25  team3   team8   Lose          [("Win","2021-11-22")]
2021-11-25  team1   team5   Lose          [("Win","2021-11-22"), ("Lose","2021-11-23"), ("Win","2021-11-25")]
2021-11-25  team6   team5   Win           [("Win","2021-11-23")]
2021-11-28  team3   team1   Lose          [("Win","2021-11-22"), ("Lose","2021-11-25")]
2021-11-29  team1   team5   Win           [("Win","2021-11-22"), ("Lose","2021-11-23"), ("Win","2021-11-25"), ("Lose","2021-11-25")]
2021-11-29  team6   team9   Win           [("Win","2021-11-23"), ("Win","2021-11-25")]

CodePudding user response:

A gnarly solution:

df['home_team_previous_results'] = (
    df.groupby('home')
    .apply(
        lambda x: pd.Series(
            [
                [
                    tuple([row[col] for col in ['home_result', 'match_date']])
                    for _, row in x.iloc[0:i].iterrows()
                ] or np.nan
                for i in range(len(x))
            ],
        index=x.index)
    ).droplevel(0)
)

Output:

>>> df
    match_date   home   away home_result                         home_team_previous_results
0   2021-11-22  team1  team2         Win                                                NaN
1   2021-11-22  team3  team4         Win                                                NaN
2   2021-11-23  team1  team8        Lose                                [(Win, 2021-11-22)]
3   2021-11-23  team6  team7         Win                                                NaN
4   2021-11-25  team1  team2         Win            [(Win, 2021-11-22), (Lose, 2021-11-23)]
5   2021-11-25  team3  team8        Lose                                [(Win, 2021-11-22)]
6   2021-11-25  team1  team5        Lose  [(Win, 2021-11-22), (Lose, 2021-11-23), (Win, ...
7   2021-11-25  team6  team5         Win                                [(Win, 2021-11-23)]
8   2021-11-28  team3  team1        Lose            [(Win, 2021-11-22), (Lose, 2021-11-25)]
9   2021-11-29  team1  team5         Win  [(Win, 2021-11-22), (Lose, 2021-11-23), (Win, ...
10  2021-11-29  team6  team9         Win             [(Win, 2021-11-23), (Win, 2021-11-25)]

One-liner version:

df['home_team_previous_results'] = df.groupby('home').apply(lambda x: pd.Series([[tuple([row[col] for col in ['home_result', 'match_date']]) for _, row in x.iloc[0:i].iterrows()] or np.nan for i in range(len(x))], index=x.index)).droplevel(0)

CodePudding user response:

Unfortunately, I don't believe Pandas supports an efficient solution.

assert isinstance(df.index, pd.RangeIndex)  # This solution assumes a RangeIndex

df['home_team_previous_results'] = pd.Series(dtype=object)

team_frames = dict(list(df.groupby('home')))

for i, row in df.iterrows():
    previous = team_frames[row['home']].loc[:i-1, ['home_result', 'match_date']]
    records = list(previous.to_records(index=False)) or float('nan')
    df.at[i, 'home_team_previous_results'] = records
  • Related