Say I have a binary imbalanced dataset like so:
from collections import Counter
from sklearn.datasets import make_classification
from matplotlib import pyplot as plt
from imblearn.over_sampling import SMOTE
# fake dataset
X, y = make_classification(n_samples=10000, n_features=2, n_redundant=0,
n_clusters_per_class=1, weights=[0.99], flip_y=0, random_state=1)
# summarize class distribution
counter = Counter(y)
print(counter)
Counter({0: 9900, 1: 100})
Using SMOTE
to oversample minority class:
oversample = SMOTE()
Xs, ys = oversample.fit_resample(X, y)
Now, to show a histogram of class distribution:
a. before oversample:
plt.hist(y)
b. after oversampled:
plt.hist(ys)
But I would like to show in the oversampled plot, portion of the minority class generated in a different color.
Expected output:
Similar to the figure below:
CodePudding user response: