Home > Enterprise >  ValueError with MinMaxScaler inverse_transform
ValueError with MinMaxScaler inverse_transform

Time:03-21

I am trying to fit an LSTM network to a dataset.

I have the following dataset:

0      17.6  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
1      38.2  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
2      39.4  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
3      38.7  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
4      39.7  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
...     ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...   
17539  56.9  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
17540  51.1  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
17541  46.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
17542  44.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
17543  40.2  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  1.0  0.0  0.0   

        27   28   29   30   31   32   33  
0      0.0  0.0  1.0  0.0  0.0  1.0  0.0  
1      0.0  0.0  1.0  0.0  0.0  1.0  0.0  
2      0.0  0.0  1.0  0.0  0.0  1.0  0.0  
3      0.0  0.0  1.0  0.0  0.0  1.0  0.0  
4      0.0  0.0  1.0  0.0  0.0  1.0  0.0  
...    ...  ...  ...  ...  ...  ...  ...  
17539  0.0  0.0  0.0  0.0  1.0  0.0  1.0  
17540  0.0  0.0  0.0  0.0  1.0  0.0  1.0  
17541  0.0  0.0  0.0  0.0  1.0  0.0  1.0  
17542  0.0  0.0  0.0  0.0  1.0  0.0  1.0  
17543  0.0  0.0  0.0  0.0  1.0  0.0  1.0

with shape:

[17544 rows x 34 columns]

Then I scale it with MinMaxScaler as follows:

scaler = MinMaxScaler(feature_range=(0,1))
data = scaler.fit_transform(data)

Then I am using a function to create my train, test dataset with shapes:

X_train :  (12232, 24, 34)
Y_train :  (12232, 24)

X_test :  (1708, 24, 34)
Y_test :  (1708, 24)

After I fit the model and I predict the values for the test set, I need to scale back to the original values and I do the following:

test_predict  = model.predict(X_test)
test_predict  = scaler.inverse_transform(test_predict)
Y_test = scaler.inverse_transform(Y_test)

But I am getting the following error:

ValueError: operands could not be broadcast together with shapes (1708,24) (34,) (1708,24) 

How can I resolve it?

CodePudding user response:

The inverse transformation expects the data in the same shape with the one produced after the transform, i.e with 34 columns. This is not the case with your test_predict, neither with your y_test.

Additionally, although irrelevant to your error, you are committing the mistake of scaling first and splitting to train/test afterwards, which is not the correct methodology as it leads to data leakage.

Here are the necessary steps to resolve this:

  1. Split first to train & test sets
  2. Transform your X_train and y_train using two different scalers for the features and output respectively, as I show in this answer of mine; you should use .fit_transform here.
  3. Fit your model with the transformed X_train and y_train (side note: it is good practice to use different names for different versions of the data, instead of overwriting the existing ones).
  4. To evaluate your model with the test data X_test & y_test, first transform them using the respective scalers from step #2; you should use .transform here (not .fit_transform again).
  5. In order to get your predictions y_pred back to the scale of your original y_test, you should use .inverse_transform of the respective scaler on them. There is of course no need to inverse transform your transformed X_test and y_test - you already have these values!
  • Related