My problem is that I need to change some sets of categorized columns into numbers for machine learning. I don't want to use LabelEncoding because I heard it's not as efficient as OnehotEncoder.
So i used this code
X = df.drop("SalePrice", axis=1)
y = df['SalePrice']
one_hot = OneHotEncoder()
transformer = ColumnTransformer([("one_hot", one_hot,categorical_features)], remainder="passthrough")
transformed_X = transformer.fit_transform(df)
Where the categorical features are the list of columns i want to use the onehotencoder on
But I get a multiple line error as an output with the overall problem stating:
TypeError: Encoders require their input to be uniformly strings or numbers. Got ['float', 'str']
Someone has had similar issues and was asked to clean his data to remove nan values and i have done that already but no change. I have also been asked to change the datatypes of my colums to strings and i wrote a loop to do that like here:
CodePudding user response:
This error is pretty self-explainatory : you cannot have str
AND float
in your columnS to use the encoder.
Where the categorical features are the list of columns i want to use the onehotencoder on
Make sure that all your columns share the same type too.
You can try to do this in order to force everything to be a string
for e in categorical_features:
df[e]=df[e].astype(str)
or maybe you have another issue with your data if everything 'should' be float. In this case use things like isnumeric