Home > Mobile >  Why do some contour lines have two different labels?
Why do some contour lines have two different labels?

Time:02-03

Does my code have a bug or something else?

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
file = 'https://aegis4048.github.io/downloads/notebooks/sample_data/unconv_MV_v5.csv'
myDF = pd.read_csv(file)
# Split the data into features and target
feature1 = "Brittle"
feature2 = "Por"
X = myDF[[feature1, feature2]]                #.iloc[:, :-1].values  # A NumPy array!
print("X.info():", X.info())
y = myDF["Prod"]         #.iloc[:, -1].values

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create a linear regression object
reg = LinearRegression()

# Fit the model to the training data
reg.fit(X_train, y_train)

# Predict the target variable using the test data
y_pred = reg.predict(X_test)

# Evaluate the model using mean squared error (MSE)
mse = np.mean((y_test - y_pred)**2)
print("Mean Squared Error: ", mse)

print("R2 Score:", reg.score(X_test, y_test))
#define figure size in (width, height) for all plots
plt.rcParams['figure.figsize'] = [10, 7]
# Create a mesh of values for the features
print(X_train.shape)  # NumPy array
x1_min, x1_max = X_train[feature1].min(), X_train[feature1].max()
x2_min, x2_max = X_train[feature2].min(), X_train[feature2].max()
x1, x2 = np.meshgrid(np.linspace(x1_min, x1_max, 100), np.linspace(x2_min, x2_max, 100))
X_mesh = np.c_[x1.ravel(), x2.ravel()]

# Compute the predictions for the mesh of values
y_pred_mesh = reg.predict(X_mesh).reshape(x1.shape)

# Plot the predictions as a surface. Request 10 contour lines.
contours = plt.contourf(x1, x2, y_pred_mesh, 10, cmap='coolwarm', alpha=0.8)   # https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.contourf.html

# Scatter plot of the training data.
# The colors of the points don't mean much except to stand out from the background 
plt.scatter(X_train[feature1], X_train[feature2], c=y_train, cmap='coolwarm', s=20)

# Label the contour lines
plt.clabel(contours, inline=1, fontsize=12, colors = "black")

# Label the plot
plt.xlabel(feature1)
plt.ylabel(feature2)
plt.title('Multivariate Linear Regression Contour Plot')

# Show the plot
plt.show()

The output: Why do some contour lines have two different labels?

CodePudding user response:

The contour lines with two different labels are expected behavior in a multivariate linear regression, as they indicate that there are two different values of the target variable (in this case, "Prod") that correspond to the same values of the two features (in this case, "Brittle" and "Por").

This could be due to either non-linear relationships between the features and the target, or to random variability in the data.

To fix the contour lines having two different labels, you can use the unique function from NumPy to only plot unique values for the target variable. This will ensure that each contour line corresponds to a single value of the target variable, rather than two different values.

 # Get unique values of the target variable
 unique_y = np.unique(y_pred_mesh)

In any case, the contour plot is a useful visualization tool for understanding the relationship between the features and the target in a linear regression model.

  • Related