I am trying to solve a problem where I have 10 days worth of data for the number of burpees completed, and based on this information want to extrapolate to estimate the total number of burpees will be completed after 20 days.
data={'Day':[1,2,3,4,5,6,7,8,9,10],'burpees':[12,20,28,32,52,59,71,85,94,112]}
df=pd.DataFrame(data)
I have run sklearn LinearRegression on the data and extracted the coefficient:
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
mdl = reg.fit(df[['Day']], df[['burpees']])
mdl.coef_
How do I get an estimation of the number of burpees on day 20?
CodePudding user response:
As per sklearn documentation, sklearn docs, input for .fit()
method should be in np.array with (n_samples, n_features) shape.
Below should work:
# your data
data = {
"Day": [1,2,3,4,5,6,7,8,9,10],
"burpees": [12, 20, 28, 32, 52, 59, 71, 85, 94, 112],
}
df = pd.DataFrame(data)
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
X = df["Day"].values.reshape(-1, 1)
y = df["burpees"].values.reshape(-1, 1)
mdl = reg.fit(X, y)
print("Intercept, coef:", mdl.intercept_, mdl.coef_)
prediction_data = np.array(20).reshape(-1,1)
print("Prediction:", mdl.predict(prediction_data)[0][0])
Output:
Intercept, coef: [-4.4] [[11.07272727]]
Prediction: 217.0545454545455