Home > other >  About spark partition index
About spark partition index

Time:09-29

Want to ask everybody prawn

If a RDD has two partitions in the machine (a machine),
If each partition of the data in the key of the hash value is equal to the division of the index, so in the process of the hash can ensure that no data movement in the physical machine?
Such as data on the partition 0 (0, 0), on the partition 1 is (1, 1),
Then the hash re - partitioning can guarantee (0, 0) and (1, 1) is still in the original place on the physical machine?
What method can guarantee this? In practice because it can save the network consumption

First thanks

CodePudding user response:

This should be a SPARK to decide, within the RDD method you see,
  • Related