How to One Hot Encode a Dataframe Column in Python?-CodePudding

I'm trying to convert a column Dataframe with One Hot Encoder with this code.

from sklearn.preprocessing import OneHotEncoder
df['label'] = OneHotEncoder().fit(df['label']).toarray()

This is the traceback

ValueError: Expected 2D array, got 1D array instead:
  array=['Label1' 'Label1' 'Label1' 'Label1' 'Label1'
 'Label1' 'Label1' 'Label1' 'Label1' 'Label1'
  'Label2' 'Label2' 'Label2' 'Label2' 'Label2' 'Label2' 'Label2'
 'Label2' 'Label2' 'Label2' 'Label3' 'Label3' 'Label3' 'Label3' 'Label3' 'Label3'
 'Label3' 'Label3' 'Label3' 'Label3' 'Label4' 'Label4' 'Label4' 'Label4' 'Label4' 'Label4'
 'Label4' 'Label4' 'Label4' 'Label4' 'Label5' 'Label5' 'Label5'
 'Label5' 'Label5' 'Label5' 'Label5' 'Label5'
 'Label5' 'Label5' 'Label6' 'Label6' 'Label6'
 'Label6' 'Label6' 'Label6' 'Label6' 'Label6'
 'Label6' 'Label6' 'Label7' 'Label7' 'Label7'
 'Label7' 'Label7' 'Label7' 'Label7' 'Label7'
 'Label7' 'Label7' 'Label8' 'Label8' 'Label8' 'Label8' 'Label8'
 'Label8' 'Label8' 'Label8' 'Label8' 'Label8' 'Label9' 'Label9'
 'Label9' 'Label9' 'Label9' 'Label9' 'Label9' 'Label9'
 'Label9' 'Label9' 'Label10' 'Label10' 'Label10' 'Label10' 'Label10'
 'Label10' 'Label10' 'Label10' 'Label10' 'Label10' 'Label11' 'Label11'
 'Label11' 'Label11' 'Label11' 'Label11' 'Label11' 'Label11' 'Label11' 'Label11'
 'Label12' 'Label12' 'Label12' 'Label12' 'Label12' 'Label12'
 'Label12' 'Label12' 'Label12' 'Label12'].
  Reshape your data either using array.reshape(-1, 1) if your data has a single feature or 
  array.reshape(1, -1) if it contains a single sample.

I already tried to reshape but the traceback is that a series has no attribute reshape. What is a workaround to use One Hot Encoder?

CodePudding user response：

See below, but note that you cannot assign the results of the OneHotEncoder to a single data frame column. I suspect that you are looking for the LabelEncoder instead.

OneHotEncoder

import pandas as pd
from sklearn.preprocessing import OneHotEncoder

df = pd.DataFrame({
    'label': ['Label1', 'Label4', 'Label2', 'Label2', 'Label1', 'Label3', 'Label3']
})

X = df['label'].values.reshape(-1, 1)
enc = OneHotEncoder().fit(X)

X = enc.transform(X).toarray()
print(X)
# [[1. 0. 0. 0.]
#  [0. 0. 0. 1.]
#  [0. 1. 0. 0.]
#  [0. 1. 0. 0.]
#  [1. 0. 0. 0.]
#  [0. 0. 1. 0.]
#  [0. 0. 1. 0.]]

X = enc.inverse_transform(X)
print(X)
# [['Label1']
#  ['Label4']
#  ['Label2']
#  ['Label2']
#  ['Label1']
#  ['Label3']
#  ['Label3']]

LabelEncoder

import pandas as pd
from sklearn.preprocessing import LabelEncoder

df = pd.DataFrame({
    'label': ['Label1', 'Label4', 'Label2', 'Label2', 'Label1', 'Label3', 'Label3']
})

y = df['label'].values
enc = LabelEncoder().fit(y)

y = enc.transform(y)
print(y)
# [0 3 1 1 0 2 2]

y = enc.inverse_transform(y)
print(y)
# ['Label1' 'Label4' 'Label2' 'Label2' 'Label1' 'Label3' 'Label3']

CodePudding user response：

There is a specific function in pandas for it called get_dummies link

pd.get_dummies(df['Label'])