Home >
other > Understanding about random forest bag outside data, what the hell am I made a mistake?
Understanding about random forest bag outside data, what the hell am I made a mistake?
Words are like this drop: for containing m a sample Of the original data sets, the original data set can be back on the sampling m times, each time be collected probability is 1/m, is the probability that will not be collected (1-1/m), m times sampling without being pumped to the probability Of m ^ (1-1/m), so in bagging sampling in each round, the training set is about 36.8% Of the data is not sampling, the data is called a Bag outside data (Out Of the Bag, OOB), "
First of all, "every time be collected probability is 1/m"? Each time data is k (kBy analogy, I found that all the data in the sample after m times can be back on the sampling, almost all the data was collected at least once, that "about 36.8% of the training set data has not been sampling", where this 36.8% of the bag outside data, I can't find???? !!!!!!!!!!!!!!!!!!!
Practice I met such a situation is, indeed, have 400 trees, each tree generated by 80% random sample, of course, every tree must have 20% of the data is not selected, but, after the 400 samples, the samples have not been sampling data, is zero!!!!!!!!!!! Is zero!!!!!!!!!!! I made a mistake???????