Using the instance about self-help method (can)-CodePudding

Today saw watermelon book chapter ii 2.2.3 self-help method produced confusion, says in the book, this method in the data set is small, it is difficult to effectively divide training set and testing set is useful, but I'm afraid I stupid, didn't think of an example, under what circumstances, is difficult to effectively differentiate for smaller data assembly? On the Internet to find a few example I don't feel very satisfied, Daniel can answer my doubt,

CodePudding user response:

Assume that the data set has two kinds, label of 1 and 0 respectively, 1 class there are 30 samples, 0 class has 10 samples, set aside and cross validation method is not suitable for classification, because number too little training data and testing data, and category ratio imbalance, reliability will be very poor, because be repeated sampling, and self-help method in the m (m=100) after sampling, the number of training set for 64, the number of test set is 36, total number of training set is greater than the original sample, the trained model generalization ability is stronger,