Home > Net >  How to factorize multiple dataframe columns with same values?
How to factorize multiple dataframe columns with same values?

Time:11-06

I have some football data with two team names and some numerical data:

ar = [
    ["browns", "patriots", 2, 5],
    ["patriots", "bills", 4, 15],
    ["browns", "bills", 1, 10],
    ["eagles", "browns", 3, 11]
]

frame = pandas.DataFrame(ar, columns=['Team1', 'Team2', 'Down', 'ToGo'])

I want to turn the team names into an enumerated type, which I can do for the first team name like:

frame['Team1'], uniques = frame['Team1'].factorize()

How can I do this to both team names at the same time, so there is consistent uniques and not have missing values? Note that 'bills' is not in Team1 column, so I can't just apply the mapping (though I don't know how to do that either).

Edit: The resulting team names should look like:

   Team1  Team2
0      0      1
1      1      3
2      0      3
3      2      0

so that 'browns' is labeled consistently as '0', in row 0 as Team1, and in row 3 as Team2.

CodePudding user response:

import numpy as np

# get the unique values from the two columns
unique = np.unique(df[['Team1', 'Team2']])
# get the factors 
factors = np.arange(len(unique))
# map the values to the corresponding factor 
df[['Team1', 'Team2']] = df[['Team1', 'Team2']].replace(unique, factors)

>>> df

   Team1  Team2  Down  ToGo
0      1      3     2     5
1      3      0     4    15
2      1      0     1    10
3      2      1     3    11

  • Related