Home > Software design >  Having trouble calculating mean squared error in sklearn python
Having trouble calculating mean squared error in sklearn python

Time:03-30

I am trying to fit a decision tree regressor to a dataset, and it is working but when I test it out by calculating mean squared error. I get an error that looks like this:

msee = mse(x_test, y_test)

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17480/3348210221.py in <module>
----> 1 msee = mse(x_test, y_test)

    ~\anaconda3\lib\site-packages\sklearn\metrics\_regression.py in mean_squared_error(y_true, y_pred, sample_weight, multioutput, squared)
        436     0.825...
        437     """
    --> 438     y_type, y_true, y_pred, multioutput = _check_reg_targets(
        439         y_true, y_pred, multioutput
        440     )
    
    ~\anaconda3\lib\site-packages\sklearn\metrics\_regression.py in _check_reg_targets(y_true, y_pred, multioutput, dtype)
        103 
        104     if y_true.shape[1] != y_pred.shape[1]:
    --> 105         raise ValueError(
        106             "y_true and y_pred have different number of output ({0}!={1})".format(
        107                 y_true.shape[1], y_pred.shape[1]
    
    ValueError: y_true and y_pred have different number of output (4!=1)

Here is the model code and a head of the df I am training the model on:

x = np.array(bat[["TB_x"]])
    y = np.array(bat[["TB_y"]])
    
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= .2, random_state= 1)
    dt = DecisionTreeRegressor(max_depth= 10, random_state= 1, min_samples_leaf=.1)
    dt.fit(x_train.reshape(-1,1), y_train.reshape(-1,1))
    y_pred = dt.predict(x_test)


    index   Year    Age_x   AgeDif_x    Tm_x    Lg_x    Lev_x   G_x PA_x    AB_x    ... BA_y    OBP_y   SLG_y   OPS_y   TB_y    GDP_y   HBP_y   SH_y    SF_y    IBB_y
0   19  2019    22.0    1.5 UCLA    P12 NCAA    38.0    72.0    58.0    ... 0.179   0.364   0.194   0.558   13.0    0.0 1.0 2.0 1.0 0.0
2   24  2020    23.0    1.7 St. Leo SSC NCAA    20.0    86.0    69.0    ... 0.156   0.309   0.219   0.527   14.0    0.0 2.0 0.0 2.0 0.0
6   45  2020    20.0    -0.8    Illinois    BTen    NCAA    13.0    58.0    47.0    ... 0.200   0.343   0.288   0.631   23.0    1.0 1.0 0.0 1.0 0.0
7   46  2020    20.0    -0.8    Illinois    BTen    NCAA    13.0    58.0    47.0    ... 0.156   0.309   0.219   0.527   14.0    0.0 2.0 0.0 2.0 0.0
8   49  2020    21.0    0.3 Miami (FL)  ACC NCAA    16.0    69.0    54.0    ... 0.200   0.343   0.288   0.631   23.0    1.0 1.0 0.0 1.0 0.0

CodePudding user response:

from the documentation:

Parameters y_true array-like of shape (n_samples,) or (n_samples, n_outputs) Ground truth (correct) target values.

y_pred array-like of shape (n_samples,) or (n_samples, n_outputs) Estimated target values.

So, instead of feeding in x_test and y_test, you need to feed in true and predicted y-values:

 y_pred = dt.predict(x_test)
 mse(y_test, y_pred)

or

mse(y_test, dt.predict(x_test))
  • Related