Home > database >  Plotting class decision boundary: determine a "good fit" range directly
Plotting class decision boundary: determine a "good fit" range directly

Time:05-11

I am trying to figure out how to plot the decision boundary line picking just few middle values instead of the entirety of the separator line such that it spans roughly the y-range that also the observations span.

Currently, I manually repeatedly select different bounds and assess visually, until "a good looking separator" emerged.

MWE:


from collections import Counter
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
import numpy as np
from sklearn.svm import SVC 

# sample data
X, y = make_classification(n_samples=100, n_features=2, n_redundant=0,
    n_clusters_per_class=1, weights=[0.9], flip_y=0, random_state=1)

# fit svm model 
svc_model = SVC(kernel='linear', random_state=32)
svc_model.fit(X, y)

# Constructing a hyperplane using a formula.
w = svc_model.coef_[0]           
b = svc_model.intercept_[0]      
x_points = np.linspace(-1, 1)   
y_points = -(w[0] / w[1]) * x_points - b / w[1]  

Figure 1:

decision boundary line spans of a much larger range, causing the observations to be "squished" on what visually looks like almost a line

plt.figure(figsize=(10, 8))

# Plotting our two-features-space
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, s=50)

# Plotting a red hyperplane
plt.plot(x_points, y_points, c='r')

figure 1

Figure 2

Manually play around with points to determine good fit visually (x_points[19:-29], y_points[19:-29] ):

plt.figure(figsize=(10, 8))

# Plotting our two-features-space
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, s=50)
# Plotting a red hyperplane
plt.plot(x_points[19:-29], y_points[19:-29], c='r')

enter image description here

How can I automate "good fit" range of values? Foe example, this works fine with n_samples=100 data points but not with n_samples=1000.

CodePudding user response:

Instead of having x_points go from -1 to 1, you could invert the linear equation and specify the bounds you want on y_points directly:

y_points = np.linspace(X[:, 1].min(), X[:, 1].max())
x_points = -(w[1] * y_points   b) / w[0]
  • Related