#REGRESSION ANALYSIS
#splitting the dataset into x and y variables
firm1=pd.DataFrame(firm, columns=['Sales', 'Advert', 'Empl', 'Prod'])
print(firm1)
x = firm1.drop(['Sales'], axis=1)
y = firm1['Sales']
print(x)
print(y)
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2)
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
#the LR model
M=linear_model.LinearRegression(fit_intercept=True)
M.fit(x_train, y_train)
y_pred=M.predict(x_test)
print(y_pred)
print('Coeff: ', M.coef_)
for i in M.coef_:
print('{:.4f}'.format(i))
print('Intercept: ','{:.4f}'.format(M.intercept_))
print('MSE: ','{:.4f}'.format(mean_squared_error(y_test, y_pred)))
print('Coeffieicnt of determination (r2): ','{:.4f}'.format(r2_score(y_test, y_pred)))
print(firm1.sample())
This is my linear regression model. Every time I run the code, I get a different sent of coefficient for the x variables and the Intercept. I cannot have a constant equation. Is that normal?
Coeff: [454.83981664 63.77031531 59.31844506] 454.8398 63.7703 59.3184 Intercept: -1073.5124 MSE: 434529.9361
Those are the values (coefficients, intercept and mean square error). However, when I run it again, I get a different output shown below
Coeff: [462.0304152 61.17909189 269.41075305] 462.0304 61.1791 269.4108 Intercept: -1462.2449 MSE: 4014768.0049
CodePudding user response:
It is normal to get different coefficients with each training of your linear regression. Indeed, initially the coefficients are set randomly and then are progressively updated with the least squares method. If you want to have the same coefficients for each training, you can set the seed with :
numpy.random.seed(42)