im building one hot encoding function from and pandas dataframe and cant figure out how to get the data back into the dataframe. I get :
"IndexError: only integers, slices (
:
), ellipsis (...
), numpy.newaxis (None
) and integer or boolean arrays are valid indices
How do I reintegrate this back into pandas data frame?
def one_hot_encoder (features, df_to_encode):
"""encoder to encoder
Parameters:
features (list): features to normalise
df_to_encode (pandas dataframe): dataframe to encode
Returns:
dataframe: dataframe to encode
"""
from sklearn.preprocessing import OneHotEncoder
for column in features:
# one hot encoder
enc = OneHotEncoder(sparse=False)
column_norm = column "_encoded"
df = enc.fit_transform(df_to_encode[[column]])
return df
columns_to_one_hot_encode = ["type"]
df = one_hot_encoder(columns_to_one_hot_encode,df)
The data im using is from https://www.kaggle.com/ealaxi/paysim1
CodePudding user response:
You don't need sklearn
, you can simply use pandas.get_dummies
import pandas as pd
def one_hot_encoder (features, df_to_encode):
"""encoder to encoder
Parameters:
features (list): features to normalise
df_to_encode (pandas dataframe): dataframe to encode
Returns:
dataframe: dataframe to encode
"""
return pd.get_dummies(df_to_encode, columns=features)
columns_to_one_hot_encode = ["type"]
df = one_hot_encoder(columns_to_one_hot_encode, df)
CodePudding user response:
You can use the get_feature_names
that is built-in SciKit's OneHotEncoder
and then subsequently drop the old column. In this way, you can still use OneHotEncoder
instead of pd.get_dummies
import pandas as pd
def one_hot_encoder (features, df_to_encode):
"""encoder to encoder
Parameters:
features (list): features to normalise
df_to_encode (pandas dataframe): dataframe to encode
Returns:
dataframe: dataframe to encode
"""
from sklearn.preprocessing import OneHotEncoder
for column in features:
enc = OneHotEncoder(sparse=False)
df_enc = pd.DataFrame(enc.fit_transform(df_to_encode[[column]]))
df_enc.columns = enc.get_feature_names([column])
df_to_encode.drop(column, axis = 1, inplace = True)
df_fin = pd.concat([df_to_encode, df_enc], axis = 1)
return df_fin
columns_to_one_hot_encode = ["type"]
df = one_hot_encoder(columns_to_one_hot_encode,df)