Home > other >  RDD withReplacement Sample in what for?
RDD withReplacement Sample in what for?

Time:09-30

This is my own test, for example,
Val a=sc. Parallelize (1 to 20, 3)
Val b=a.s ample (true, 0.8, 0)
Val c=a.s ample (false, 0.8, 0)
Println (" RDD a: "+ a.c ollect () mkString (","))
Println (" RDD b: "+ biggest ollect () mkString (","))
Println (" RDD c: "+ Arthur c. ollect () mkString (","))

RDD a: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
RDD b: 1, 2, 2, 3, 3, 4, 4, 6, 7, 9, 9, 10, 12, 14, 14, 15, 16, 17, 17, 17, 18, 18, 18
RDD c: 1, 2, 4, 5, 8, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20

Only I can get the conclusion, withReplacement to true, return to have a child of the rally, repeat, is false, return a subset of the there will not be repeated
And both get a subset of the size (weight) are about 20 * 0.8

But these are my observations, I want to know what this parameter is used for?

CodePudding user response:

Sample is the meaning of sampling, withReplacement refers to whether there is a back sampling
Is true for the return, is false is not back
  • Related