Home > OS >  Walking average based on two matching columns
Walking average based on two matching columns

Time:11-16

I have a dataframe df of the following format:

    team1 team2 score1 score2
0       1     2      1      0
1       3     4      3      0
2       1     3      1      1
3       2     4      0      2
4       1     2      3      2

What I want to do is to create a new column that will return rolling average of the score1 column of last 3 games but only when the two teams from team1 and team2 are matching.

Expected output:

    team1 team2 score1 score2 new
0       1     2      1      0   1
1       3     4      3      0   3
2       1     3      1      1   1
3       2     4      0      2   0
4       1     2      3      2   2

I was able to calculate walking average for all games for each team separately like that:

df['new'] = df.groupby('team1')['score1'].transform(lambda x: x.rolling(3, min_periods=1).mean()

but cannot find a sensible way to expand that to match two teams.

I tried the code below that returns... something, but definitely not what I need.

df['new'] = df.groupby(['team1','team2'])['score1'].transform(lambda x: x.rolling(3, min_periods=1).mean()

I suppose this could be done with apply() but I want to avoid it due to performace issues.

CodePudding user response:

Not sure what is your exact expected output, but you can first reshape the DataFrame to a long format:

(pd.wide_to_long(df.reset_index(), ['team', 'score'], i='index', j='x')
   .groupby('team')['score']
   .rolling(3, min_periods=1).mean()
)

Output:

team  index  x
1     0      1    1.0
      2      1    1.0
2     3      1    0.0
      0      2    0.0
3     1      1    3.0
      2      2    2.0
4     1      2    0.0
      3      2    1.0
Name: score, dtype: float64

CodePudding user response:

The walkaround I've found was to create 'temp' column that merges the values in 'team1' and 'team2' and uses that column as a reference for the rolling average.

df['temp'] = df.team1 '_' df.team2
df['new'] = df.groupby('temp')['score1'].transform(lambda x: x.rolling(3, min_periods=1).mean()

Can this be done in one line?

  • Related