I am trying to fit a decision tree regressor to a dataset, and it is working but when I test it out by calculating mean squared error. I get an error that looks like this:
msee = mse(x_test, y_test)
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17480/3348210221.py in <module>
----> 1 msee = mse(x_test, y_test)
~\anaconda3\lib\site-packages\sklearn\metrics\_regression.py in mean_squared_error(y_true, y_pred, sample_weight, multioutput, squared)
436 0.825...
437 """
--> 438 y_type, y_true, y_pred, multioutput = _check_reg_targets(
439 y_true, y_pred, multioutput
440 )
~\anaconda3\lib\site-packages\sklearn\metrics\_regression.py in _check_reg_targets(y_true, y_pred, multioutput, dtype)
103
104 if y_true.shape[1] != y_pred.shape[1]:
--> 105 raise ValueError(
106 "y_true and y_pred have different number of output ({0}!={1})".format(
107 y_true.shape[1], y_pred.shape[1]
ValueError: y_true and y_pred have different number of output (4!=1)
Here is the model code and a head of the df I am training the model on:
x = np.array(bat[["TB_x"]])
y = np.array(bat[["TB_y"]])
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= .2, random_state= 1)
dt = DecisionTreeRegressor(max_depth= 10, random_state= 1, min_samples_leaf=.1)
dt.fit(x_train.reshape(-1,1), y_train.reshape(-1,1))
y_pred = dt.predict(x_test)
index Year Age_x AgeDif_x Tm_x Lg_x Lev_x G_x PA_x AB_x ... BA_y OBP_y SLG_y OPS_y TB_y GDP_y HBP_y SH_y SF_y IBB_y
0 19 2019 22.0 1.5 UCLA P12 NCAA 38.0 72.0 58.0 ... 0.179 0.364 0.194 0.558 13.0 0.0 1.0 2.0 1.0 0.0
2 24 2020 23.0 1.7 St. Leo SSC NCAA 20.0 86.0 69.0 ... 0.156 0.309 0.219 0.527 14.0 0.0 2.0 0.0 2.0 0.0
6 45 2020 20.0 -0.8 Illinois BTen NCAA 13.0 58.0 47.0 ... 0.200 0.343 0.288 0.631 23.0 1.0 1.0 0.0 1.0 0.0
7 46 2020 20.0 -0.8 Illinois BTen NCAA 13.0 58.0 47.0 ... 0.156 0.309 0.219 0.527 14.0 0.0 2.0 0.0 2.0 0.0
8 49 2020 21.0 0.3 Miami (FL) ACC NCAA 16.0 69.0 54.0 ... 0.200 0.343 0.288 0.631 23.0 1.0 1.0 0.0 1.0 0.0
CodePudding user response:
from the documentation:
Parameters y_true array-like of shape (n_samples,) or (n_samples, n_outputs) Ground truth (correct) target values.
y_pred array-like of shape (n_samples,) or (n_samples, n_outputs) Estimated target values.
So, instead of feeding in x_test and y_test, you need to feed in true and predicted y-values:
y_pred = dt.predict(x_test)
mse(y_test, y_pred)
or
mse(y_test, dt.predict(x_test))