Home > other >  Spark within the partition data acquisition
Spark within the partition data acquisition

Time:09-23

Data. MapPartitionsWithIndex
{
(index, points)=& gt;
}
How to access the index in the curly braces + I partition the data, the novice consult!

CodePudding user response:

According to the do you want to access other partition 's data?

MapParitions (func) or mapPartitionsWihIndex (func) are for the performance optimization, which allow your function to be run once PER partition, that 's according to its the function type must be Iterator=> Iterator . You access the whole parittion 's data in the iterator, but should and can NOT access other partitions' data.

CodePudding user response:

According to the do you want to access other partition 's data?
Why do you want to access data from another partition?

MapParitions (func) or mapPartitionsWihIndex (func) are for the performance optimization, which allow your function to be run once PER partition, that 's according to its the function type must be Iterator=> Iterator . You access the whole parittion 's data in the iterator, but should and can NOT access other partitions' data.

MapParitions (func) or mapPartitionsWihIndex (func) is optimized, when used in these operations allows you to access to each partition, which is why this function provides an Iterator iterative reference to you, you can through the iterators iterate through all the data partition, but a partition of the Iterator cannot access other data partition,

CodePudding user response:

Upstairs has been explained very clearly, if you need an inter-bank access data, please use the self join,
  • Related