I have a dataframe df
of the following format:
team1 team2 score1 score2
0 1 2 1 0
1 3 4 3 0
2 1 3 1 1
3 2 4 0 2
4 1 2 3 2
What I want to do is to create a new
column that will return rolling average of the score1
column of last 3 games but only when the two teams from team1
and team2
are matching.
Expected output:
team1 team2 score1 score2 new
0 1 2 1 0 1
1 3 4 3 0 3
2 1 3 1 1 1
3 2 4 0 2 0
4 1 2 3 2 2
I was able to calculate walking average for all games for each team separately like that:
df['new'] = df.groupby('team1')['score1'].transform(lambda x: x.rolling(3, min_periods=1).mean()
but cannot find a sensible way to expand that to match two teams.
I tried the code below that returns... something, but definitely not what I need.
df['new'] = df.groupby(['team1','team2'])['score1'].transform(lambda x: x.rolling(3, min_periods=1).mean()
I suppose this could be done with apply() but I want to avoid it due to performace issues.
CodePudding user response:
Not sure what is your exact expected output, but you can first reshape the DataFrame to a long format:
(pd.wide_to_long(df.reset_index(), ['team', 'score'], i='index', j='x')
.groupby('team')['score']
.rolling(3, min_periods=1).mean()
)
Output:
team index x
1 0 1 1.0
2 1 1.0
2 3 1 0.0
0 2 0.0
3 1 1 3.0
2 2 2.0
4 1 2 0.0
3 2 1.0
Name: score, dtype: float64
CodePudding user response:
The walkaround I've found was to create 'temp' column that merges the values in 'team1' and 'team2' and uses that column as a reference for the rolling average.
df['temp'] = df.team1 '_' df.team2
df['new'] = df.groupby('temp')['score1'].transform(lambda x: x.rolling(3, min_periods=1).mean()
Can this be done in one line?