I'm stuck with the following problem since last night and I haven't found any solution anywhere.
Given the dataframe df
:
team1 team2 score1 score2
0 A B 1 0
1 C A 3 2
2 B A 2 3
3 A C 2 1
I would like to pass a function that calculates moving average for the team1
BUT take into account both team1
and team2
columns.
The output for moving average of 2 would be:
team1 team2 score1 score2 mov_avg_a
0 A B 1 2 1 # for A
1 C A 3 2 1.5 # for C
2 B A 2 3 2.5 # for B
3 A C 2 1 2.5 # for A
My idea is to call .apply()
with custom function that would:
Step 1. Blend team1
and team2
columns into a temporary column tempA
with score1
and score2
values are returned if A (for example) is present, like so:
team1 team2 score1 score2 tempA
0 A B 1 0 1
1 C A 3 2 2
2 B A 2 3 3
3 A C 2 1 2
Step 2. Apply rolling(2)
to the tempA
to get the desired output as seen above.
I have tried creating this process and failed spectacularly.
I am aware that using apply()
in the case of large dataframe will be computationally expensive but I cannot think of a 'one line' solution here.
Thank you in advance for your insights.
Dataframe for tests:
df = pd.DataFrame(
{
'team1': ['A', 'C', 'B', 'A'],
'team2': ['B', 'A', 'A', 'C'],
'score1': [1, 3, 2, 2],
'score2': [0, 2, 3, 1]
}
)
EDIT:
Upon some further thoughts I think the best solution is to create two separate datasets for team1 and team2 each, perform calculations on them and merge them back if needed.
CodePudding user response:
Given the clarification in comments I'll suggest this...
In [2]: df
Out[2]:
team1 team2 score1 score2
0 A B 1 0
1 C A 3 2
2 B A 2 3
3 A C 2 1
In [3]: # restructure data frame
...: df_team_scores = pd.wide_to_long(df.assign(game_index=df.index),
...: ['team', 'score'],
...: i='game_index',
...: j='column_suffix')
...: df_team_scores
Out[3]:
team score
game_index column_suffix
0 1 A 1
1 1 C 3
2 1 B 2
3 1 A 2
0 2 B 0
1 2 A 2
2 2 A 3
3 2 C 1
In [4]: # restore proper of scores (in order of game_index)
...: # order by team first to make table easier to understand
...: df_team_scores = df_team_scores.reset_index().sort_values(['team', 'game_index'])
...: df_team_scores
Out[4]:
game_index column_suffix team score
0 0 1 A 1
5 1 2 A 2
6 2 2 A 3
3 3 1 A 2
4 0 2 B 0
2 2 1 B 2
1 1 1 C 3
7 3 2 C 1
In [5]: # Compute the rolling score
...: s_rolling_score = df_team_scores.groupby(by='team')['score'].rolling(2, min_periods=1).mean()
...: s_rolling_score
Out[5]:
team
A 0 1.0
5 1.5
6 2.5
3 2.5
B 4 0.0
2 1.0
C 1 3.0
7 2.0
Name: score, dtype: float64
In [6]: # Force indices to be compatible and merge back to team_scores data frame
...: df_team_scores['rolling_score'] = s_rolling_score.reset_index(level=0, drop=True)
...: df_team_scores
Out[6]:
game_index column_suffix team score rolling_score
0 0 1 A 1 1.0
5 1 2 A 2 1.5
6 2 2 A 3 2.5
3 3 1 A 2 2.5
4 0 2 B 0 0.0
2 2 1 B 2 1.0
1 1 1 C 3 3.0
7 3 2 C 1 2.0
Not quite a one-liner but does not rely on custom functions. If you need to merge this back into the original data frame, I'll leave it to someone else to figure it out.
CodePudding user response:
You can do wide_to_long
then pivot
and rolling
with mean
s = (pd.wide_to_long(df.assign(key=df.index),['team','score'],i = 'key',j='m').
reset_index().
pivot('key','team','score').
fillna(0).
rolling(2,min_periods=1, axis=1).mean())
Out[32]:
team A B C
key
0 1.0 0.5 0.0
1 2.0 1.0 1.5
2 3.0 2.5 1.0
3 2.0 1.0 0.5
df = df.join(s)