Val a=sc. Parallelize (1 to 20, 3)
Val b=a.s ample (true, 0.8, 0)
Val c=a.s ample (false, 0.8, 0)
Println (" RDD a: "+ a.c ollect () mkString (","))
Println (" RDD b: "+ biggest ollect () mkString (","))
Println (" RDD c: "+ Arthur c. ollect () mkString (","))
RDD a: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
RDD b: 1, 2, 2, 3, 3, 4, 4, 6, 7, 9, 9, 10, 12, 14, 14, 15, 16, 17, 17, 17, 18, 18, 18
RDD c: 1, 2, 4, 5, 8, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20
Only I can get the conclusion, withReplacement to true, return to have a child of the rally, repeat, is false, return a subset of the there will not be repeated
And both get a subset of the size (weight) are about 20 * 0.8
But these are my observations, I want to know what this parameter is used for?
CodePudding user response:
Sample is the meaning of sampling, withReplacement refers to whether there is a back samplingIs true for the return, is false is not back