Home > OS >  What's the difference between repartition() vs spark.sql.shuffle.partitions
What's the difference between repartition() vs spark.sql.shuffle.partitions

Time:10-07

What happens when we repartition data to higher number than spark.sql.shuffle.partitions property? Are these related?

CodePudding user response:

It depends on which variant of Dataset.repartition you will call.

If you call repartition(partitionExprs: Column*): Dataset[T] - in this case number of partitions will be based on spark.sql.shuffle.partitions parameter.

If you call repartition(numPartitions: Int): Dataset[T] - in this case number of partitions will be based on numPartitions passed parameter.

  • Related