I'm attempting to calculate the statistical significance of classifiers using WEKA Java API. I was reading the documentation and see that I need to use calculateStatistics
from PairedCorrectedTTester
I'm not sure how to use it.
Any ideas?
public static void main(String[] args) throws Exception {
ZeroR zr = new ZeroR();
Bagging bg = new Bagging();
Experiment exp = new Experiment();
exp.setPropertyArray(new Classifier[0]);
exp.setUsePropertyIterator(true);
SplitEvaluator se = null;
Classifier sec = null;
se = new ClassifierSplitEvaluator();
sec = ((ClassifierSplitEvaluator) se).getClassifier();
CrossValidationResultProducer cvrp = new CrossValidationResultProducer();
cvrp.setNumFolds(10);
cvrp.setSplitEvaluator(se);
PropertyNode[] propertyPath = new PropertyNode[2];
propertyPath[0] = new PropertyNode(
se,
new PropertyDescriptor("splitEvaluator", CrossValidationResultProducer.class), CrossValidationResultProducer.class
);
propertyPath[1] = new PropertyNode(
sec,
new PropertyDescriptor("classifier",
se.getClass()),
se.getClass()
);
exp.setResultProducer(cvrp);
exp.setPropertyPath(propertyPath);
// set classifiers here
exp.setPropertyArray(new Classifier[]{zr, bg});
DefaultListModel model = new DefaultListModel();
File file = new File("dataset arff file");
model.addElement(file);
exp.setDatasets(model);
InstancesResultListener irl = new InstancesResultListener();
irl.setOutputFile(new File("output.csv"));
exp.setResultListener(irl);
exp.initialize();
exp.runExperiment();
exp.postProcess();
PairedCorrectedTTester tester = new PairedCorrectedTTester();
Instances result = new Instances(new BufferedReader(new FileReader(irl.getOutputFile())));
tester.setInstances(result);
tester.setSortColumn(-1);
tester.setRunColumn(result.attribute("Key_Run").index());
tester.setFoldColumn(result.attribute("Key_Fold").index());
tester.setResultsetKeyColumns(
new Range(
""
(result.attribute("Key_Dataset").index() 1)));
tester.setDatasetKeyColumns(
new Range(
""
(result.attribute("Key_Scheme").index() 1)
","
(result.attribute("Key_Scheme_options").index() 1)
","
(result.attribute("Key_Scheme_version_ID").index() 1)));
tester.setResultMatrix(new ResultMatrixPlainText());
tester.setDisplayedResultsets(null);
tester.setSignificanceLevel(0.05);
tester.setShowStdDevs(true);
tester.multiResultsetFull(0, result.attribute("Percent_correct").index());
System.out.println("\nResult:");
ResultMatrix matrix = tester.getResultMatrix();
System.out.println(matrix.toStringMatrix());
}
Results from code above: results
What I want is similar to the WEKA GUI (circled in red):
Statistical Significance using WEKA GUI
Resources Used:
- https://waikato.github.io/weka-wiki/experimenter/using_the_experiment_api/
- http://sce.carleton.ca/~mehrfard/repository/Case_Studies_(No_instrumentation)/Weka/doc/weka/experiment/PairedCorrectedTTester.html
CodePudding user response:
You have to swap the key columns for dataset and resultset if you want to statistically evaluate classifiers on datasets (rather than datasets on classifiers):
tester.setDatasetKeyColumns(
new Range(
""
(result.attribute("Key_Dataset").index() 1)));
tester.setResultsetKeyColumns(
new Range(
""
(result.attribute("Key_Scheme").index() 1)
","
(result.attribute("Key_Scheme_options").index() 1)
","
(result.attribute("Key_Scheme_version_ID").index() 1)));
That will give you something like this when using the UCI dataset anneal:
Result:
Dataset (1) rules.ZeroR '' | (2) meta.Baggin
--------------------------------------------------------------
anneal (100) 76.17(0.55) | 98.73(1.12) v
--------------------------------------------------------------
(v/ /*) | (1/0/0)