I am analyzing a dataset containing NFL game results over the past 20 years and am trying to create a column denoting for each team whether or not the game was a home game or away game (home game = 1, away game = 0).
The code I have so far is:
home_list = list(df.home_team.unique())
def home_or_away(team_name, dataf):
dataf['home_or_away'] = np.where(dataf['home_team'] == team_name, 1, 0)
return dataf
for i in home_list:
home_update_all = home_or_away(i, df)
df.update(home_update_all)
This doesn't seem to yield the correct results as each team is just overwritten when iterating over them. Any ideas on how to solve this?
Thanks!
CodePudding user response:
Not really sure what your expected output is. Do you mean you want one column per team? You currently keep creating columns but with the same name so always only the one in the last iteration will be kept, the rest overwritten. Or do you want multiple DataFrames?
If you want multiple columns, one per team:
import pandas as pd
df = pd.DataFrame({'game': [1, 2, 3, 4], 'home_team': ['a', 'b', 'c', 'a']})
> game home_team
0 1 a
1 2 b
2 3 c
3 4 a
First collect unique teams as you did:
home_list = list(df.home_team.unique())
Create a column for each team:
for team in home_list:
df[f'home_or_away_{team}'] = [int(ht==team) for ht in df['home_team']]
Which results in:
> game home_team home_or_away_a home_or_away_b home_or_away_c
0 1 a 1 0 0
1 2 b 0 1 0
2 3 c 0 0 1
3 4 a 1 0 0
CodePudding user response:
You're over complicating it. Don't need to iterate with numpy .where()
. Just use the np.where()
on the 2 columns (not with a separate function).
Basically says "where home_team equals team_name, put a 1, else put 0"
import pandas as pd
import numpy as np
dataf = pd.DataFrame([['Chicago Bears','Chicago Bears', 'Green Bay Packers'],
['Chicago Bears','Green Bay Packers', 'Chicago Bears'],
['Detriot Lions','Detriot Lions', 'Los Angeles Chargers'],
['New England Patriots','New York Jets', 'New England Patriots'],
['Houston Texans','Los Angeles Rams', 'Houston Texans']],
columns = ['team_name','home_team','away_team'])
dataf['home_or_away'] = np.where(dataf['home_team'] == dataf['team_name'], 1, 0)
Output:
print(dataf)
team_name home_team away_team home_or_away
0 Chicago Bears Chicago Bears Green Bay Packers 1
1 Chicago Bears Green Bay Packers Chicago Bears 0
2 Detriot Lions Detriot Lions Los Angeles Chargers 1
3 New England Patriots New York Jets New England Patriots 0
4 Houston Texans Los Angeles Rams Houston Texans 0