I have a list of columns whose values are all strings. I need to one hot encode them with pd.get_dummies()
.
I want to keep the original name of those columns along with the value.
So lets say I have a column named Street
, and its values are Paved
and Not Paved
.
After running get_dummies()
, I would like the 2 resulting columns to be entitled Street_Paved
and Street_Not_Paved
. Is this possible? Basically the format for the prefix
parameter is {i}_{value}
, with i
referring to the for i in cols
common nomenclature.
My code is:
cols = ['Street', 'Alley', 'CentralAir', 'Utilities', 'LandSlope', 'PoolQC']
pd.get_dummies(df, columns = cols, prefix = '', prefix_sep = '')
CodePudding user response:
If remove prefix = '', prefix_sep = ''
parameters get default prefix
from columns names with default separator _
:
df = pd.DataFrame({'Street' : ['Paved','Paved','Not Paved','Not Paved'],
'Alley':list('acca')})
cols = ['Street','Alley']
df = pd.get_dummies(df, columns = cols)
print (df)
Street_Not Paved Street_Paved Alley_a Alley_c
0 0 1 1 0
1 0 1 0 1
2 1 0 0 1
3 1 0 1 0
If need replace all spaces by _
add rename:
cols = ['Street','Alley']
df = pd.get_dummies(df, columns = cols).rename(columns=lambda x: x.replace(' ', '_'))
print (df)
Street_Not_Paved Street_Paved Alley_a Alley_c
0 0 1 1 0
1 0 1 0 1
2 1 0 0 1
3 1 0 1 0