I have been learning about decision trees and how to make them in sklearn. But when I have tried it out I have been unsuccessful in all my attempts to avoid a vlaue error that reads
"The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" here is the full error:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15136/2104431115.py in <module>
2 dt = DecisionTreeRegressor(max_depth= 5, random_state= 1, min_samples_leaf=.1)
3 dt.fit(x_train.reshape(-1,1), y_train.reshape(-1,1))
----> 4 y_pred = dt.predict(x_test, y_test)
~\anaconda3\lib\site-packages\sklearn\tree\_classes.py in predict(self, X, check_input)
465 """
466 check_is_fitted(self)
--> 467 X = self._validate_X_predict(X, check_input)
468 proba = self.tree_.predict(X)
469 n_samples = X.shape[0]
~\anaconda3\lib\site-packages\sklearn\tree\_classes.py in _validate_X_predict(self, X, check_input)
430 def _validate_X_predict(self, X, check_input):
431 """Validate the training data on predict (probabilities)."""
--> 432 if check_input:
433 X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr", reset=False)
434 if issparse(X) and (
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
and here is all of my code so far for this model:
x = np.array(bat[["TB_x"]])
y = np.array(bat[["TB_y"]])
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= .2, random_state= 1)
dt = DecisionTreeRegressor(max_depth= 5, random_state= 1, min_samples_leaf=.1)
dt.fit(x_train.reshape(-1,1), y_train.reshape(-1,1))
y_pred = dt.predict(x_test, y_test)
origonally I was getting an error that would say that it was expecting a 2d array but wass getting a 1d array, I solved that problem by using reshape but now I get this value error that I do not understand.
CodePudding user response:
This is a slight misunderstanding about how the predict
function works. If you think about it conceptually, if you are trying to predict something, why would you need to pass in the expected labels?
In a DecisionTreeRegressor
(and in probably all sklearn models) the signature of predict
is predict(X, check_input=True)
, you only need to pass in the features, not the expected labels.
You are doing y_pred = dt.predict(x_test, y_test)
but the second argument that predict
expects is actually just a boolean that allows you to disable some sanity checks about x_test
.
You just need to do the following instead:
y_pred = dt.predict(x_test)
You can refer to the sklearn documentation for a DecisionTreeRegressor for more info