Home > Software engineering >  How to perform classification on training and test dataset in Weka
How to perform classification on training and test dataset in Weka

Time:10-23

I am using Weka software to classify model. I have confusion using training and testing dataset partition. I divide 60% of the whole dataset as training dataset and save it to my hard disk and use 40% of data as test dataset and save this data to another file. The data that I am using is an imbalanced data. So I applied SMOTE in my training dataset. After that, in the classify tab of the Weka I selected Use training set option from Test options and used Random Forest classifier to do the classification on the training dataset. After getting the result I chose Supplied test set option from Test options and load my test dataset from hard disk and again ran the classifier.

I try to find out tutorial on how to load training set and test set in Weka but did not get it. I did the above process depend upon my understanding.

Therefore, I would like to know is that the right way to perform classification on training and test dataset?

Thank you.

CodePudding user response:

There is no need to evaluate your classifier on the training set (this will be overly optimistic, since the classifier has already seen this data). Just use the Supplied test set option, then your classifier will get trained automatically on the currently loaded dataset before being evaluated on the specified test set.

Instead of manually splitting your data, you could also use the Percentage split test option, with 60% to be used for your training data.

When using filters, you should always wrap them (in this case SMOTE) and your classifier (in this case RandomForest) in the FilteredClassifier meta-classifier. That way, you will ensure that the training and test set data will get transformed correctly. This will also avoid the problem of leaking information into the test set when transforming the full dataset with a supervised filter and splitting the dataset into train/test afterwards. Finally, it also documents nicely what preprocessing is being done to your input data, all in a single command-line string.

If you need to apply more than one filter, use the MultiFilter to apply them sequentially.

  • Related