Does Y
have to be one-hot encoded or not? For example in this code:
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(X, Y)
CodePudding user response:
Y should be one column of your dataframe
CodePudding user response:
No, y
should usually not be one-hot encoded.
In general, one-hot encoding or dummy encoding is used to encode features (columns in X
), not the target y
.
In machine learning tasks, y
is very often a single column from your data, represented as a 1d array, a list, or a Pandas Series
. Notice the example in the docs:
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([1, 1, 1, 2, 2, 2])
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(X, Y)
y
is usually written with a lower-case letter (though not always, as you see above), because conventional mathematical notation uses upper-case letters like X
for matrices (often represented as 2d arrays in NumPy, or as a DataFrame
in Pandas), and lower-case for vectors (1d arrays or Series
in Pandas).
CodePudding user response:
Did you check the documentation ? https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html?highlight=gaussiannb#sklearn.naive_bayes.GaussianNB.fit