I am beginner in ML/AI and trying to do pre-proccesing on my dataset of digits that I've made myself. I want to apply OneHotEncoding on my categorical variable (which is a dependent one,idk if it is important) but getting "tuple index out of range" error. I was searching on the internet and the only solution was to use reshape() function but it didn't help or may be i am not using it correctly.
Here is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
#Data Preprocessing
dataset = pd.read_csv('dataset_cisla_polia2.csv',header = None,sep = ';')
X = dataset.iloc[:, 0:28].values
y = dataset.iloc[:, 29].values
print(X)
print(y)
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[29])],remainder = 'passthrough')
y = np.array(ct.fit_transform(y))
I am expecting to get variable y to be like this: digit 1 is encoded that way = [1 0 0 0 0 0 0 0 0 0 ], digit 2 is encoded that way = [0 1 0 0 0 0 0 0 0 0 ] and so on..
CodePudding user response:
This is because ct = ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[29])],remainder = 'passthrough')
will one-hot encode the column of index 29.
You are fit-transforming y
which only has 1 column. You can change the 29
to 0
.
ct = ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[0])],remainder = 'passthrough')
Edit
You also need to change the iloc
to keep the numpy array as column structure.
y = dataset.iloc[:, [29]].values