Home > Software engineering >  I am having trouble working with dataframe(X.inverseTransform(data_out), columns=data_out.columns)
I am having trouble working with dataframe(X.inverseTransform(data_out), columns=data_out.columns)

Time:03-13

I have been working on this linear regression case, where I got stuck at the point of validating my work. For validating I had to use:

sns.regplot(x=X_2["pk"], y=y_2)

scaler_2 = StandardScaler()
scaler_2.fit(df)
# type(scaler_2)

X_2 = df.drop(['prijs'], axis=1)
# print(X_2.shape)
# type(X_2)

y_2 = df['prijs']
# print(y_2.shape)
# type(y_2)

#======================
test_data = 0.30
X_train_2, X_test_2, y_train_2, y_test_2 = train_test_split(X_2,y_2, test_size=test_data, random_state=12)
# print(f"formaat X_train_2 {X_train_2.shape}")
# print(f"formaat y_train_2 {y_train_2.shape}")
# print(f"formaat X_test_2  {X_test_2.shape}")
# print(f"formaat y_test_2  {y_test_2.shape}")

# X_train_2 = None
# X_test_2 = None
# y_train_2 = None
# y_test_2 = None
model_2 = LinearRegression()
X_train_simpel = X_train_2[['pk']]
X_test_simpel = X_test_2[['pk']]
fit_2 = model_2.fit(X_train_simpel, y_train_2)
uitkomst_2 = fit_2.predict(X_train_simpel)
uitkomst_3 = fit_2.predict(X_test_simpel)

data_out = X_train_2
data_out = pd.DataFrame(scaler_2.inverse_transform(data_out),columns=data_out.columns)
data_out['groep'] = uitkomst_2

data_out.head(5)

But then I am getting this error while running the last 2 lines of code:

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Input In [136], in 1 #haal de originele ongeschaalde waardes terug ----> 2 data_out = pd.DataFrame(scaler_2.inverse_transform(data_out),columns=data_out.columns) 3 data_out['groep'] = uitkomst_2 4 data_out.head(5)

File C:\Python310\lib\site-packages\sklearn\preprocessing_data.py:1035, in StandardScaler.inverse_transform(self, X, copy) 1033 else: 1034 if self.with_std: -> 1035 X *= self.scale_ 1036 if self.with_mean: 1037 X = self.mean_

ValueError: operands could not be broadcast together with shapes (11484,7) (8,) (11484,7)

CodePudding user response:

The 'scaler_2' fitted on all columns but 'scaler_2.inverse_transform(data_out)' wants to transform a dataframe with lesser columns

I mean 'prijs' column remove after 'scaler_2' fit and it produces the error later at 'scaler_2.inverse_transform(data_out)', so you must first drop 'prijs' column and fit the data on scaler_2

The following code could fix your problem:

...
scaler_2 = StandardScaler()
X_2 = df.drop(['prijs'], axis=1)
scaler_2.fit(X_2 )
...
  • Related