I have numpy arrays split into X and y, originally made from Pandas DataFrame as follows:
>> X
array([[ 2.86556780e-03, 1.87100798e-01],
[ 2.56781670e-04, 2.45417491e-01],
[ 2.35497137e-03, 1.76615342e-01],
...,
[ 2.30078468e-03, -4.16726811e-60],
[ 5.66213972e-03, -2.98597808e-60],
[ 4.39503905e-03, -2.13954678e-60]])
>> y
array([19.08666992, 19.09239006, 19.08938026, ..., 45.21157634,
45.19350761, 45.13230675])
I split them into training and test dataset as follows:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Before scaling the data, I reshape my labels as follows:
y_train= y_train.reshape((-1,1))
y_test= y_test.reshape((-1,1))
Using sklearn MinMaxScaler
I then fit_transform
my training_data as follows:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
y_train = scaler.fit_transform(y_train)
I then try to transform
my test data using MinMaxScaler
as follows:
X_test = scaler.transform(X_test)
y_test = scaler.transform(y_test)
But test dataset is not transformed as I get the following error:
----> 1 X_test = scaler.transform(X_test)
ValueError: X has 2 features, but MinMaxScaler is expecting 1 features as input.
Can anyone guide me what I am doing wrong here.
CodePudding user response:
This is because scaler
is fit to y_train
which has a single feature, whereas X_test
has 2 features.
You have to define different scaler objects for X
and y
:
scaler_X = MinMaxScaler()
scaler_Y = MinMaxScaler()
X_train = scaler_X.fit_transform(X_train)
y_train = scaler_Y.fit_transform(y_train)
X_test = scaler_X.transform(X_test)
y_test = scaler_Y.transform(y_test)
another way to do the same job is to use a scaler fit to X_train
to transform X_test
; then use a scaler fit to y_train
to transform y_test
:
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
y_train = scaler.fit_transform(y_train)
y_test = scaler.transform(y_test)