What happens when we repartition data to higher number than spark.sql.shuffle.partitions property? Are these related?
CodePudding user response:
It depends on which variant of Dataset.repartition you will call.
If you call repartition(partitionExprs: Column*): Dataset[T]
- in this case number of partitions will be based on spark.sql.shuffle.partitions parameter.
If you call repartition(numPartitions: Int): Dataset[T]
- in this case number of partitions will be based on numPartitions passed parameter.