I have some football data with two team names and some numerical data:
ar = [
["browns", "patriots", 2, 5],
["patriots", "bills", 4, 15],
["browns", "bills", 1, 10],
["eagles", "browns", 3, 11]
]
frame = pandas.DataFrame(ar, columns=['Team1', 'Team2', 'Down', 'ToGo'])
I want to turn the team names into an enumerated type, which I can do for the first team name like:
frame['Team1'], uniques = frame['Team1'].factorize()
How can I do this to both team names at the same time, so there is consistent uniques and not have missing values? Note that 'bills'
is not in Team1
column, so I can't just apply the mapping (though I don't know how to do that either).
Edit: The resulting team names should look like:
Team1 Team2
0 0 1
1 1 3
2 0 3
3 2 0
so that 'browns' is labeled consistently as '0', in row 0 as Team1, and in row 3 as Team2.
CodePudding user response:
import numpy as np
# get the unique values from the two columns
unique = np.unique(df[['Team1', 'Team2']])
# get the factors
factors = np.arange(len(unique))
# map the values to the corresponding factor
df[['Team1', 'Team2']] = df[['Team1', 'Team2']].replace(unique, factors)
>>> df
Team1 Team2 Down ToGo
0 1 3 2 5
1 3 0 4 15
2 1 0 1 10
3 2 1 3 11