Home > Back-end >  Python - creating column containing the name of a team's opponent based on info in dataframe
Python - creating column containing the name of a team's opponent based on info in dataframe

Time:10-27

I have a dataframe (called "games") that contains a play-by-play list of a basketball games, where I record scoring streaks within single games (GameID identifies a specific game). The dataframe is sorted by matches (i.e., GameID).

Example of dataset "games":

    GameID  TeamID  Scoring Streak 
0   nbaG1     A        23     
1   nbaG1     B        12   
2   nbaG1     B        11   
3   nbaG1     A        24  
4   nbaG1     B        21 
5   nbaG2     C        15  
6   nbaG2     C        12 
7   nbaG2     D        17 
8   nbaG2     C        11
9   nbaG2     D        21 
10  nbaG3     E        10 
11  nbaG3     F        12  
12  nbaG3     F        14

From this dataframe I would like to simply create a column that displays the name of the opposing team within the respective match. For example in Game1 (nbaG1) it is Team A vs B. So, if Team A scores the new column "opponents" should say "B". However I have no idea how to scan for names within each game and return the value of the opposing team...nor have I found tips in other threads.

Desired output for my dataset "games":

    GameID  TeamID  Scoring Streak  Opponents
0   nbaG1     A        23              B
1   nbaG1     B        12              A
2   nbaG1     B        11              A
3   nbaG1     A        24              B
4   nbaG1     B        21              A
5   nbaG2     C        15              D
6   nbaG2     C        12              D
7   nbaG2     D        17              C
8   nbaG2     C        11              D
9   nbaG2     D        21              C
10  nbaG3     E        10              F
11  nbaG3     F        12              E
12  nbaG3     F        14              E

CodePudding user response:

First, group your dataframe by GameID and invoke .unique() to get the two teams that are playing the game

teams = df.groupby("GameID")["TeamID"].unique()

# game_teams :
GameID
nbaG1    [A, B]
nbaG2    [C, D]
nbaG3    [E, F]

Then, use this to look up both teams in each game and add the column to your original dataframe:

df["Teams"] = teams[df["GameID"]].to_list()

# df: 
   GameID TeamID  Scoring Streak   Teams
0   nbaG1      A              23  [A, B]
1   nbaG1      B              12  [A, B]
2   nbaG1      B              11  [A, B]
3   nbaG1      A              24  [A, B]
4   nbaG1      B              21  [A, B]
5   nbaG2      C              15  [C, D]
6   nbaG2      C              12  [C, D]
7   nbaG2      D              17  [C, D]
8   nbaG2      C              11  [C, D]
9   nbaG2      D              21  [C, D]
10  nbaG3      E              10  [E, F]
11  nbaG3      F              12  [E, F]
12  nbaG3      F              14  [E, F]

Finally, apply a function to each row that takes the element from Teams that is not in TeamID

def select_opponent(row):
    for team in row["Teams"]:
        if team != row["TeamID"]:
            return team
    return None

df["Opponent"] = df.apply(select_opponent, axis=1)

# df: 
   GameID TeamID  Scoring Streak   Teams Opponent
0   nbaG1      A              23  [A, B]        B
1   nbaG1      B              12  [A, B]        A
2   nbaG1      B              11  [A, B]        A
3   nbaG1      A              24  [A, B]        B
4   nbaG1      B              21  [A, B]        A
5   nbaG2      C              15  [C, D]        D
6   nbaG2      C              12  [C, D]        D
7   nbaG2      D              17  [C, D]        C
8   nbaG2      C              11  [C, D]        D
9   nbaG2      D              21  [C, D]        C
10  nbaG3      E              10  [E, F]        F
11  nbaG3      F              12  [E, F]        E
12  nbaG3      F              14  [E, F]        E

CodePudding user response:

For a general approach that would also work with more than two teams per game, you could use sets: First compute the set of teams for each game, then get the opponents for each row by computing the set difference between the set of teams and the set only containing the team itself.

teams_per_game = games.groupby('GameID')['TeamID'].apply(set)

opponents = []
for i in games.index:
    opponents.append((teams_per_game.loc[games.loc[i, 'GameID']] 
                      - {games.loc[i, 'TeamID']}).pop())

games['Opponents'] = opponents
  • Related