Home > Back-end >  Something strange about groupBy on spark
Something strange about groupBy on spark

Time:07-24

I'm learning about groupBy function on spark,I create a list with 2 partitions,then use groupBy to get every odd and even numbers.I found if I define

val rdd = sc.makeRDD(List(1, 2, 3, 4),2) 
val result = rdd.groupBy(_ % 2 )

the result with goes to their own partition. But if I define

val result = rdd.groupBy(_ % 2 ==0)

the result turns to in one partition.could anybody explain why?

CodePudding user response:

It's just the hashing applied to groupBy Shuffle.

 val rdd = sc.makeRDD(List(1, 2, 3, 4), 5) 

With 5 partitions you see 2 partitions used and 3 empty. Just an algorithm.

  • Related