Moving average for value present in two dataframe columns in python-CodePudding

I'm stuck with the following problem since last night and I haven't found any solution anywhere.

Given the dataframe df:

  team1 team2 score1 score2
0     A     B      1      0
1     C     A      3      2
2     B     A      2      3
3     A     C      2      1

I would like to pass a function that calculates moving average for the team1 BUT take into account both team1 and team2 columns.

The output for moving average of 2 would be:

  team1 team2 score1 score2 mov_avg_a
0     A     B      1      2         1   # for A
1     C     A      3      2       1.5   # for C
2     B     A      2      3       2.5   # for B
3     A     C      2      1       2.5   # for A

My idea is to call .apply() with custom function that would:

Step 1. Blend team1 and team2 columns into a temporary column tempA with score1 and score2 values are returned if A (for example) is present, like so:

  team1 team2 score1 score2 tempA
0     A     B      1      0     1
1     C     A      3      2     2
2     B     A      2      3     3
3     A     C      2      1     2

Step 2. Apply rolling(2) to the tempA to get the desired output as seen above.

I have tried creating this process and failed spectacularly.

I am aware that using apply() in the case of large dataframe will be computationally expensive but I cannot think of a 'one line' solution here.

Thank you in advance for your insights.

Dataframe for tests:

df = pd.DataFrame(
    {
        'team1': ['A', 'C', 'B', 'A'],
        'team2': ['B', 'A', 'A', 'C'],
        'score1': [1, 3, 2, 2],
        'score2': [0, 2, 3, 1]
    }
    )

EDIT:

Upon some further thoughts I think the best solution is to create two separate datasets for team1 and team2 each, perform calculations on them and merge them back if needed.

CodePudding user response：

Given the clarification in comments I'll suggest this...

In [2]: df
Out[2]:
  team1 team2  score1  score2
0     A     B       1       0
1     C     A       3       2
2     B     A       2       3
3     A     C       2       1

In [3]: # restructure data frame
   ...: df_team_scores = pd.wide_to_long(df.assign(game_index=df.index),
   ...:         ['team', 'score'],
   ...:         i='game_index',
   ...:         j='column_suffix')
   ...: df_team_scores
Out[3]:
                         team  score
game_index column_suffix
0          1                A      1
1          1                C      3
2          1                B      2
3          1                A      2
0          2                B      0
1          2                A      2
2          2                A      3
3          2                C      1

In [4]: # restore proper of scores (in order of game_index)
   ...: # order by team first to make table easier to understand
   ...: df_team_scores = df_team_scores.reset_index().sort_values(['team', 'game_index'])
   ...: df_team_scores
Out[4]:
   game_index  column_suffix team  score
0           0              1    A      1
5           1              2    A      2
6           2              2    A      3
3           3              1    A      2
4           0              2    B      0
2           2              1    B      2
1           1              1    C      3
7           3              2    C      1

In [5]: # Compute the rolling score
   ...: s_rolling_score = df_team_scores.groupby(by='team')['score'].rolling(2, min_periods=1).mean()
   ...: s_rolling_score
Out[5]:
team
A     0    1.0
      5    1.5
      6    2.5
      3    2.5
B     4    0.0
      2    1.0
C     1    3.0
      7    2.0
Name: score, dtype: float64

In [6]: # Force indices to be compatible and merge back to team_scores data frame
   ...: df_team_scores['rolling_score'] = s_rolling_score.reset_index(level=0, drop=True)
   ...: df_team_scores
Out[6]:
   game_index  column_suffix team  score  rolling_score
0           0              1    A      1            1.0
5           1              2    A      2            1.5
6           2              2    A      3            2.5
3           3              1    A      2            2.5
4           0              2    B      0            0.0
2           2              1    B      2            1.0
1           1              1    C      3            3.0
7           3              2    C      1            2.0

Not quite a one-liner but does not rely on custom functions. If you need to merge this back into the original data frame, I'll leave it to someone else to figure it out.

CodePudding user response：

You can do wide_to_long then pivot and rolling with mean

s = (pd.wide_to_long(df.assign(key=df.index),['team','score'],i = 'key',j='m').
     reset_index().
     pivot('key','team','score').
     fillna(0).
     rolling(2,min_periods=1, axis=1).mean())
Out[32]: 
team    A    B    C
key                
0     1.0  0.5  0.0
1     2.0  1.0  1.5
2     3.0  2.5  1.0
3     2.0  1.0  0.5
df = df.join(s)