Consider the following DataFrame:
team player metric_a metric_b
0 1 player1 15 25
1 1 player2 20 30
2 1 player3 21 12
3 2 player4 9 46
4 2 player5 39 5
How could we compute the (absolute) difference of every player with every other player of the same team?
Output:
team player_a player_b metric_a_diff metric_b_diff
0 1 player1 player2 5 10
1 1 player1 player3 6 13
2 1 player2 player3 1 18
3 2 player4 player5 30 41
I have tried something along the lines of:
from itertools import combinations
combos = lambda s : pd.DataFrame(list(combinations(s.values, 2)), columns=['player_a', 'player_b'])
df.groupby('team')['player'].apply(combos).reset_index(level=1, drop=True).reset_index()
but I don't know how to get diff
in to play.
CodePudding user response:
Let's try something new...self merge
the dataframe on team
to create all possible combinations then calculate the diff by subtracting the columns from left and right dataframes
s = df.reset_index()
s = (
s.merge(s, on='team')
.query('index_y > index_x') # remove duplicate combinations
.drop(columns=['index_x', 'index_y']) # drop unwanted columns
)
s['metric_a_diff'] = s.pop('metric_a_y').sub(s.pop('metric_a_x')).abs()
s['metric_b_diff'] = s.pop('metric_b_y').sub(s.pop('metric_b_x')).abs()
Result
team player_x player_y metric_a_diff metric_b_diff
1 1 player1 player2 5 5
2 1 player1 player3 6 13
5 1 player2 player3 1 18
10 2 player4 player5 30 41
CodePudding user response:
Code:
new = pd.DataFrame(columns=['player_a', 'player_b','metric_a_diff','metric_b_diff'])
def combos(ls):
for i,j in enumerate(ls):
if i%2==0:
new.loc[len(new)] =[j[0],ls[i 1][0], abs(j[1]-ls[i 1][1]), abs(j[2]-ls[i 1][2])]
df.groupby('team')[['player','metric_a','metric_b']].apply(lambda p : combos([l for ls in list(combinations(p.values, 2)) for l in ls]))
new