Home > Back-end >  How to calculate difference among every item within a group?
How to calculate difference among every item within a group?

Time:10-27

Consider the following DataFrame:

   team player  metric_a  metric_b
0   1   player1   15        25
1   1   player2   20        30
2   1   player3   21        12
3   2   player4   9         46
4   2   player5   39         5

How could we compute the (absolute) difference of every player with every other player of the same team?

Output:

    team  player_a player_b metric_a_diff metric_b_diff
0    1    player1  player2         5             10  
1    1    player1  player3         6             13
2    1    player2  player3         1             18
3    2    player4  player5         30            41

I have tried something along the lines of:

from itertools import combinations

combos = lambda s : pd.DataFrame(list(combinations(s.values, 2)), columns=['player_a', 'player_b'])


df.groupby('team')['player'].apply(combos).reset_index(level=1, drop=True).reset_index()

but I don't know how to get diff in to play.

CodePudding user response:

Let's try something new...self merge the dataframe on team to create all possible combinations then calculate the diff by subtracting the columns from left and right dataframes

s = df.reset_index()
s = (
    s.merge(s, on='team')
     .query('index_y > index_x') # remove duplicate combinations
     .drop(columns=['index_x', 'index_y']) # drop unwanted columns
)

s['metric_a_diff'] = s.pop('metric_a_y').sub(s.pop('metric_a_x')).abs()
s['metric_b_diff'] = s.pop('metric_b_y').sub(s.pop('metric_b_x')).abs()

Result

    team player_x player_y  metric_a_diff  metric_b_diff
1      1  player1  player2              5              5
2      1  player1  player3              6             13
5      1  player2  player3              1             18
10     2  player4  player5             30             41

CodePudding user response:

Code:

new = pd.DataFrame(columns=['player_a', 'player_b','metric_a_diff','metric_b_diff'])
def combos(ls):
    for i,j in enumerate(ls):
        if i%2==0:
            new.loc[len(new)] =[j[0],ls[i 1][0], abs(j[1]-ls[i 1][1]), abs(j[2]-ls[i 1][2])]

df.groupby('team')[['player','metric_a','metric_b']].apply(lambda p : combos([l for ls in list(combinations(p.values, 2)) for l in ls]))
new
  • Related