Download the dataset, where the first four columns are features, and the last column corresponds to categories (3 labels). Perform the following tasks.
- Split the dataset into train and test sets (80:20)
- Construct the Naive Bayes classifier from scratch and train it on the train set. Assume Gaussian distribution to compute probabilities.
- Evaluate the performance using the following metric on the test set a. Confusion matrix b. Overall and class-wise accuracy c. ROC curve, AUC
- Use any library (e.g. scikit-learn) and repeat 1 to 3
- Compare and comment on the performance of the results of the classifier in 2 and 4 6. Calculate the Bayes risk. Consider, λ = 2 1 6 4 2 4 6 3 1 Where λ is a loss function and rows and columns corresponds to classes (ci) and actions (aj) respectively, e.g. λ(a3 / c2) = 4
CodePudding user response:
It's not clear what specific part of the problem you're having trouble with, which makes it hard to give specific advice.
With that in mind, here is some reading that might help get you started:
- If the dataset is in CSV format, you can read it into a dataframe using pd.read_csv() as discussed here: https://www.geeksforgeeks.org/python-read-csv-using-pandas-read_csv/
- To split the df into a train set and test set, you can import scikit-learn (sklearn) and then use train_test_split() as discussed here: https://www.stackvidhya.com/train-test-split-using-sklearn-in-python/
- It sounds like your professor (or whoever is the source of this question) wants you to write a function that duplicates a Naive Bayes classifier, so I'll leave you to figure that out. Sklearn does provide a Naive Bayes classifier you can read about here and use to verify your results: https://scikit-learn.org/stable/modules/naive_bayes.html
- For confusion matrices, sklearn (again) provides some functionality that will let you plot a confusion matrix: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.ConfusionMatrixDisplay.html#sklearn.metrics.ConfusionMatrixDisplay.from_predictions
- For the ROC curve, you can see here: https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
Hope this is enough to get you started.
CodePudding user response:
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(label,targets, test_size=0.20,random_state=42)
example of gaussian naive bayes
from sklearn.naive_bayes import GaussianNB
define the model
model = GaussianNB()
fit the model
model.fit(X_train,y_train)
predict=model.predict(x_test) matrix = classification_report(y_test,predict) print('Classification report :\n',matrix)
https://scikit-learn.org/stable/modules/cross_validation.html