Apply coefficients from n degree polynomial to formula-CodePudding

I use a sklearn LinearRegression()estimator, with 5 variables

['feat1', 'feat2', 'feat3', 'feat4', 'feat5']

In order to predict a continuous value.

Estimator returns the list of coefficient values and the bias:

linear = LinearRegression()
print(linear.coef_)
print(linear.intercept_)

[ 0.18799409 -0.05406106 -0.01327966 -0.13348129 -0.00614054]
-0.011064865422734674

Then, given the fact I have each feature as variables, I can hardcode the coefficients into a linear formula and estimate my values, like so:

val = ((0.18799409*feat1) - (0.05406106*feat2) - (0.01327966*feat3) - (0.13348129*feat4) - (0.00614054*feat5)) -0.011064865422734674

Now lets say I use a polynomial regression of degree 2, using a pipeline, and by printing:

model = Pipeline(steps=[
    ('scaler',StandardScaler()),
    ('polynomial_features', PolynomialFeatures(degree=degree, include_bias=False)), 
    ('linear_regression', LinearRegression())])

#fit model
model.fit(X_train, y_train)

print(model['linear_regression'].coef_)
print(model['linear_regression'].intercept_)

I get:

[ 7.06524186e-01 -2.98605001e-02 -4.67175212e-02 -4.86890790e-01
 -1.06320101e-02 -2.77958604e-03 -3.38253025e-04 -7.80563090e-03
  4.51356888e-03  8.32036733e-03  3.57638244e-02 -2.16446849e-02
 -7.92169287e-02  3.36809467e-02 -6.60531497e-03  2.16613331e-02
  2.10097993e-02  3.49970303e-02 -3.02970698e-02 -7.81462599e-03]
0.011042927069084668

How do I transform the formula above in order to calculate val from regression, with values from .coef_ and .intercept_, using array indexing instead of hardcoding the values, for any 'n' degree ?

Is there any scipy or numpy method suited for that?

CodePudding user response：

It's important to note that polynomial regression is just an extended case of linear regression, thus all we need to do is transform our input data consistently. For any N we can use the PolynomialFeatures from sklearn.preprocessing. From using dummy data, we can see how this would work:

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
#set parameters
X = np.stack([np.arange(i,i 10) for i in range(5)]).T
Y = np.random.randn(10)*10 3
N = 2

poly_reg=PolynomialFeatures(degree=N,include_bias=False)
X_poly=poly_reg.fit_transform(X) 
#print(X[0],X_poly[0]) #to check parameters, note that it includes the y intercept as an input of 1

poly = LinearRegression().fit(X_poly, Y)

And thus, we can get the coef_ the way you were doing before, and simply perform a matrix multiplication to get the regressed value.

new_dat = poly_reg.transform(np.arange(2,2 10,2)[None]) #5 new datapoints 
np.testing.assert_array_equal(poly.predict(new_dat),new_dat @ poly.coef_   poly.intercept_)

----EDIT----

In case you cannot use the transform for PolynomialFeatures, it's just a iterated combination loop to generate the data from your list of features.

new_feats = np.array([feat1,feat2,feat3,feat4,feat5])

from itertools import combinations_with_replacement
def gen_poly_feats(x,N):
    #this function returns all unique groupings (w/ replacement) of the indices into the array x for use in polynomial regression.
    return np.concatenate([[np.product(x[np.array(i)]) for i in list(combinations_with_replacement(range(len(x)), n))] for n in range(1,N 1)])[None]

new_feats_poly = gen_poly_feats(new_feats,N)
# just to be sure that this matches...
np.testing.assert_array_equal(new_feats_poly,poly_reg.transform(new_feats[None]))
#then we can use the above linear regression model to predict the new data
val = new_feats_poly @ poly.coef_   poly.intercept_