Home > other >  Update Pandas DataFrame for each string in a list
Update Pandas DataFrame for each string in a list

Time:06-06

I am analyzing a dataset containing NFL game results over the past 20 years and am trying to create a column denoting for each team whether or not the game was a home game or away game (home game = 1, away game = 0).

The code I have so far is:

home_list = list(df.home_team.unique())
def home_or_away(team_name, dataf):
   dataf['home_or_away'] = np.where(dataf['home_team'] == team_name, 1, 0)
   return dataf

for i in home_list:
   home_update_all = home_or_away(i, df)
   df.update(home_update_all)

This doesn't seem to yield the correct results as each team is just overwritten when iterating over them. Any ideas on how to solve this?

Thanks!

CodePudding user response:

Not really sure what your expected output is. Do you mean you want one column per team? You currently keep creating columns but with the same name so always only the one in the last iteration will be kept, the rest overwritten. Or do you want multiple DataFrames?

If you want multiple columns, one per team:

import pandas as pd

df = pd.DataFrame({'game': [1, 2, 3, 4], 'home_team': ['a', 'b', 'c', 'a']})
>    game home_team
  0     1         a
  1     2         b
  2     3         c
  3     4         a

First collect unique teams as you did:

home_list = list(df.home_team.unique())

Create a column for each team:

for team in home_list:
    df[f'home_or_away_{team}'] = [int(ht==team) for ht in df['home_team']]

Which results in:

>   game home_team  home_or_away_a  home_or_away_b  home_or_away_c
 0     1         a               1               0               0
 1     2         b               0               1               0
 2     3         c               0               0               1
 3     4         a               1               0               0

CodePudding user response:

You're over complicating it. Don't need to iterate with numpy .where(). Just use the np.where() on the 2 columns (not with a separate function).

Basically says "where home_team equals team_name, put a 1, else put 0"

import pandas as pd
import numpy as np

dataf = pd.DataFrame([['Chicago Bears','Chicago Bears', 'Green Bay Packers'],
                   ['Chicago Bears','Green Bay Packers', 'Chicago Bears'],
                   ['Detriot Lions','Detriot Lions', 'Los Angeles Chargers'],
                   ['New England Patriots','New York Jets', 'New England Patriots'],
                   ['Houston Texans','Los Angeles Rams', 'Houston Texans']], 
                  columns = ['team_name','home_team','away_team'])


dataf['home_or_away'] = np.where(dataf['home_team'] == dataf['team_name'], 1, 0)

Output:

print(dataf)
              team_name          home_team             away_team  home_or_away
0         Chicago Bears      Chicago Bears     Green Bay Packers             1
1         Chicago Bears  Green Bay Packers         Chicago Bears             0
2         Detriot Lions      Detriot Lions  Los Angeles Chargers             1
3  New England Patriots      New York Jets  New England Patriots             0
4        Houston Texans   Los Angeles Rams        Houston Texans             0
  • Related