Home > database >  Preserve original column name in pd.get_dummies()
Preserve original column name in pd.get_dummies()

Time:01-28

I have a list of columns whose values are all strings. I need to one hot encode them with pd.get_dummies().

I want to keep the original name of those columns along with the value. So lets say I have a column named Street, and its values are Paved and Not Paved. After running get_dummies(), I would like the 2 resulting columns to be entitled Street_Paved and Street_Not_Paved. Is this possible? Basically the format for the prefix parameter is {i}_{value}, with i referring to the for i in cols common nomenclature.

My code is:

cols = ['Street', 'Alley', 'CentralAir', 'Utilities', 'LandSlope', 'PoolQC']
pd.get_dummies(df, columns = cols, prefix = '', prefix_sep = '')

CodePudding user response:

If remove prefix = '', prefix_sep = '' parameters get default prefix from columns names with default separator _:

df = pd.DataFrame({'Street' : ['Paved','Paved','Not Paved','Not Paved'],
                   'Alley':list('acca')})


cols = ['Street','Alley']

df = pd.get_dummies(df, columns = cols)

print (df)
   Street_Not Paved  Street_Paved  Alley_a  Alley_c
0                 0             1        1        0
1                 0             1        0        1
2                 1             0        0        1
3                 1             0        1        0

If need replace all spaces by _ add rename:

cols = ['Street','Alley']
df = pd.get_dummies(df, columns = cols).rename(columns=lambda x: x.replace(' ', '_'))

print (df)
   Street_Not_Paved  Street_Paved  Alley_a  Alley_c
0                 0             1        1        0
1                 0             1        0        1
2                 1             0        0        1
3                 1             0        1        0
  • Related