I have a dataframe (called "games") that contains a play-by-play list of a basketball games, where I record scoring streaks within single games (GameID identifies a specific game). The dataframe is sorted by matches (i.e., GameID).
Example of dataset "games":
GameID TeamID Scoring Streak
0 nbaG1 A 23
1 nbaG1 B 12
2 nbaG1 B 11
3 nbaG1 A 24
4 nbaG1 B 21
5 nbaG2 C 15
6 nbaG2 C 12
7 nbaG2 D 17
8 nbaG2 C 11
9 nbaG2 D 21
10 nbaG3 E 10
11 nbaG3 F 12
12 nbaG3 F 14
From this dataframe I would like to simply create a column that displays the name of the opposing team within the respective match. For example in Game1 (nbaG1) it is Team A vs B. So, if Team A scores the new column "opponents" should say "B". However I have no idea how to scan for names within each game and return the value of the opposing team...nor have I found tips in other threads.
Desired output for my dataset "games":
GameID TeamID Scoring Streak Opponents
0 nbaG1 A 23 B
1 nbaG1 B 12 A
2 nbaG1 B 11 A
3 nbaG1 A 24 B
4 nbaG1 B 21 A
5 nbaG2 C 15 D
6 nbaG2 C 12 D
7 nbaG2 D 17 C
8 nbaG2 C 11 D
9 nbaG2 D 21 C
10 nbaG3 E 10 F
11 nbaG3 F 12 E
12 nbaG3 F 14 E
CodePudding user response:
First, group your dataframe by GameID and invoke .unique()
to get the two teams that are playing the game
teams = df.groupby("GameID")["TeamID"].unique()
# game_teams :
GameID
nbaG1 [A, B]
nbaG2 [C, D]
nbaG3 [E, F]
Then, use this to look up both teams in each game and add the column to your original dataframe:
df["Teams"] = teams[df["GameID"]].to_list()
# df:
GameID TeamID Scoring Streak Teams
0 nbaG1 A 23 [A, B]
1 nbaG1 B 12 [A, B]
2 nbaG1 B 11 [A, B]
3 nbaG1 A 24 [A, B]
4 nbaG1 B 21 [A, B]
5 nbaG2 C 15 [C, D]
6 nbaG2 C 12 [C, D]
7 nbaG2 D 17 [C, D]
8 nbaG2 C 11 [C, D]
9 nbaG2 D 21 [C, D]
10 nbaG3 E 10 [E, F]
11 nbaG3 F 12 [E, F]
12 nbaG3 F 14 [E, F]
Finally, apply
a function to each row that takes the element from Teams
that is not in TeamID
def select_opponent(row):
for team in row["Teams"]:
if team != row["TeamID"]:
return team
return None
df["Opponent"] = df.apply(select_opponent, axis=1)
# df:
GameID TeamID Scoring Streak Teams Opponent
0 nbaG1 A 23 [A, B] B
1 nbaG1 B 12 [A, B] A
2 nbaG1 B 11 [A, B] A
3 nbaG1 A 24 [A, B] B
4 nbaG1 B 21 [A, B] A
5 nbaG2 C 15 [C, D] D
6 nbaG2 C 12 [C, D] D
7 nbaG2 D 17 [C, D] C
8 nbaG2 C 11 [C, D] D
9 nbaG2 D 21 [C, D] C
10 nbaG3 E 10 [E, F] F
11 nbaG3 F 12 [E, F] E
12 nbaG3 F 14 [E, F] E
CodePudding user response:
For a general approach that would also work with more than two teams per game, you could use sets: First compute the set of teams for each game, then get the opponents for each row by computing the set difference between the set of teams and the set only containing the team itself.
teams_per_game = games.groupby('GameID')['TeamID'].apply(set)
opponents = []
for i in games.index:
opponents.append((teams_per_game.loc[games.loc[i, 'GameID']]
- {games.loc[i, 'TeamID']}).pop())
games['Opponents'] = opponents