Home > Enterprise >  sklearn OneHotEncoder wrong shape
sklearn OneHotEncoder wrong shape

Time:11-06

I have an array

y_train: array([ 0,  0,  0, -1, 1, 0, -1, 0, ..., -1, 0, 1], dtype=int64)

I did this:

enc = OneHotEncoder()
y_train = enc.fit_transform(y_train.reshape(1,-1))

and the result came out to be

(0, 0)  1.0
(0, 1)  1.0
(0, 2)  1.0
(0, 3)  1.0
(0, 4)  1.0
(0, 5)  1.0

But what I really want is it to be onehot encoded like following:

[1,0,0]
[1,0,0]
[0,1,0]
[0,0,1]
.....

How to fix it?

CodePudding user response:

You have to use toarray() function after you apply the encoding to your y_train variable:

from sklearn import preprocessing
import numpy as np

y_train = np.array([0, 0, 0, -1, 1, 0, -1, 0, -1, 0, 1]).reshape(-1, 1)
enc = preprocessing.OneHotEncoder()
y_train = enc.fit_transform(y_train).toarray()
print(y_train)

And you will get this output:

[[0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
  • Related