from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(Xtrain, ytrain)
pred=lr.predict(Xtest)
pred
My Ytest value is something like this
Price_euros
248 675.0
556 255.0
693 2590.0
387 1369.0
781 2350.0
... ...
468 1699.0
508 1323.0
1187 691.0
1010 949.0
1053 979.0
But predicted value is something like:
0 7.547000e 02
1 -7.503793e 10
2 2.169000e 03
3 -4.296977e 09
4 1.020596e 10
... ...
256 -7.759706e 09
257 -5.626814e 09
258 7.135000e 02
259 8.365000e 02
260 8.423000e 02
Is it a decimal problem? How to round off the predicted value and predicted value should not be in negative isn't it??
MSE is 6.255155054767432e 20
I dont think this is correct
CodePudding user response:
Linear regression is an affine model, int the sense that the prediction is of form
f(x) = <w, x> b = SUM_i w_i x_i b
What this means in practise is that there are always some inputs, for which it will output negative values. It has nothing to do with what you train on, it is a property of linear models.
1 -7.503793e 10
2 2.169000e 03
3 -4.296977e 09
4 1.020596e 10
... ...
256 -7.759706e 09
257 -5.626814e 09
258 7.135000e 02
259 8.365000e 02
260 8.423000e 02
All e 02
and e 03
are results very much in your daat range, since these are hundreads and thousands. Now why are there some really off the charts? Again, because it is a linear model, which literally is multiplying each of your inputs by some weight, and adds them up. It is a very rigid, limited class of models, and thus to minimise the error, sometimes it needs to make huge mistakes.