Home > Enterprise >  Ridge Regression Based on sklearn
Ridge Regression Based on sklearn

Time:12-19

Based on Numpy, create data x and label y to train the ridge regression model, and then use another created x and y to predict the regression. The percentage of correct predictions is only 14/64. I don't know where the problem is. Below is my code.

import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import Ridge

one_hot = OneHotEncoder(sparse=False)

x = np.random.rand(64,40) * 2 - 1
y = np.random.randint(0,5,(64,))
y = one_hot.fit_transform(y.reshape(-1,1))
clf = Ridge(alpha=1.0)
readout = clf.fit(x,y)

a = np.random.rand(64,40) * 2 - 1
b = np.random.randint(0,5,(64,))
b = one_hot.fit_transform(b.reshape(-1,1))
y_hat = readout.predict(a)
y_hat = np.argmax(y_hat,axis=1)
target = np.argmax(b,axis=1)
correct = (y_hat == target).sum()

print(correct)     # 14

CodePudding user response:

For regression to work, one fundamental assumption is that, somehow, X should have power to predict y.

In your case, both X and y are randomly generated

x = np.random.rand(64,40) * 2 - 1
y = np.random.randint(0,5,(64,))

and thus X does NOT have any predictive power at all. In this scenario, any regression, or fancier machine learning models can produce nothing better than random guess and this is exactly what you get. According to y = np.random.randint(0,5,(64,)), y is random in the interval [0, 5) so for random guess you have a 20% chance to get the right answer and 14/64=0.21875 is just that.

  • Related