Home > Net >  Excel vs. Sci Kit Learn Linear Regression or scipy.stats Provide Different Slopes, Intercepts, R2 Va
Excel vs. Sci Kit Learn Linear Regression or scipy.stats Provide Different Slopes, Intercepts, R2 Va

Time:10-06

I cannot figure out why I get different values for slope, intercept, and r2 values from excel vs. scikit learn (or scipy.stats!). This is a very simple linear regression, literally six "x" values and six "y" values. I use Excel all the time for regression, but can't figure out why the numbers don't match.

import numpy as np
from scipy import stats as stats

my_ys = [0, 0.700287879, 1.475575758, 2.266727273, 2.973575758, 3.749106061]
my_xs = [0, 2.5, 5, 7.5, 10, 12.5]


array_y = np.array(my_ys)
array_x = np.array(my_xs)
slope, intercept, rvalue, pvalue, stderr = stats.linregress(array_x, array_y)

my_r2 = rvalue**2

print("r2 = ", my_r2)
print("intercept is: ", intercept)
print("slope is: ", slope)

Produces:

r2 =  0.9997292041335445
intercept is:  -0.021731601619047636
slope is:  0.3012176623657143

Similarly:

from sklearn.linear_model import LinearRegression
import numpy as np 

my_ys = [0, 0.700287879, 1.475575758, 2.266727273, 2.973575758, 3.749106061]
my_xs = [0, 2.5, 5, 7.5, 10, 12.5]

array_y = np.array(my_ys)
array_x = np.array(my_xs).reshape((-1,1))

#create the model
my_model = LinearRegression().fit(array_x, array_y)

#find the fit
r2_of_standard_regression = my_model.score(array_x, array_y)

#give the intercept or b value of the regression
b_intercept_of_standard_regression = my_model.intercept_

#give the slope or m of the y=mx b
m_slope_of_standard_regression = my_model.coef_


print("r2 = ", r2_of_standard_regression)
print("intercept is: ", b_intercept_of_standard_regression)
print("slope is: ", m_slope_of_standard_regression)

Produces:

r2 =  0.999729204133545
intercept is:  -0.021731601619047636
slope is:  [0.30121766]

However, when I use Excel (either Linest, or just a fit to a scatter plot), I get different values:

Excel data with updated x values

CodePudding user response:

Maybe it's because Excel, by default, uses up to maximum of 100 iterations, or a maximum change of 0.001. Changing the latter to 0.0001 (or less) would probably solve your problem. BTW, I'm using LibreOffice and its results are the same as python. Also, from the attached picture, it seems y value for x=7.5 ,10 ,12.5 may differ, albeit slightly, from the values given in your python code, unless they're shortened because of your cell width size.

  • Related