I am working on a regression model (Decision Tree) on a multidimensional data, with 16 features. The model r2_score is 0.97. The y test and y predict plot looks so wrong! the range of x is not the same.
would you please tell me what is the problem?
I have also tried to fit the model in one dimension to check the x range in the diagram, but it just decrease the score obviously, and the diagram is still odd!
CodePudding user response:
Matplotlib's plot
function draws a single line by connecting the points in the order that they are drawn. The reason you are seeing a mess is because the points are not ordered along the x-axis.
In a regression model, you have a function f(x) -> R
where f
here is your decision tree and x
is in the 16 dimensional space. However, you cannot order your x
, which has 16 dimensions, along the x-axis.
Instead, what you can do is just plot the the ground truth and predicted values for each index as a scatter plot:
import numpy as np
# Here, I'm assuming y_DT_5 is either a 1D array or a column vector.
# If necessary, change the argument of np.arange accordingly to get the number of values
idxs = np.arange(len(y_DT_5))
plt.figure(figsize=(16,4))
plt.scatter(x=idxs, y=y_DT_5, marker='x') # Plot each ground truth value as separate pts
plt.scatter(x=idxs, y=y_test, marker='.') # Plot each predicted value as separate points
If your model works, the 2 points plotted at each index should be close along the y-axis.