linear regression prediction values are in negative and big decimal-CodePudding

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(Xtrain, ytrain)
pred=lr.predict(Xtest)
pred

My Ytest value is something like this

Price_euros
248 675.0
556 255.0
693 2590.0
387 1369.0
781 2350.0
... ...
468 1699.0
508 1323.0
1187    691.0
1010    949.0
1053    979.0

But predicted value is something like:

0   7.547000e 02
1   -7.503793e 10
2   2.169000e 03
3   -4.296977e 09
4   1.020596e 10
... ...
256 -7.759706e 09
257 -5.626814e 09
258 7.135000e 02
259 8.365000e 02
260 8.423000e 02

Is it a decimal problem? How to round off the predicted value and predicted value should not be in negative isn't it??

MSE is 6.255155054767432e 20

I dont think this is correct

CodePudding user response：

Linear regression is an affine model, int the sense that the prediction is of form

f(x) = <w, x>   b = SUM_i w_i x_i   b

What this means in practise is that there are always some inputs, for which it will output negative values. It has nothing to do with what you train on, it is a property of linear models.

1   -7.503793e 10
2   2.169000e 03
3   -4.296977e 09
4   1.020596e 10
... ...
256 -7.759706e 09
257 -5.626814e 09
258 7.135000e 02
259 8.365000e 02
260 8.423000e 02

All e 02 and e 03 are results very much in your daat range, since these are hundreads and thousands. Now why are there some really off the charts? Again, because it is a linear model, which literally is multiplying each of your inputs by some weight, and adds them up. It is a very rigid, limited class of models, and thus to minimise the error, sometimes it needs to make huge mistakes.