Home > other >  How to manually in the spark will average task assigned to different partition
How to manually in the spark will average task assigned to different partition

Time:09-23

You Daniel, I have such a problem, now I get data from redis, Mr Partition to store the data of redis, I need to make in each partition of data in different patition processing respectively, how should do ah, how should be handled?

CodePudding user response:

You can within the foreachPartition operator to judge, and then according to the conditions specified method,
For example,
 
RDD. ForeachPartition (new VoidFunction () {

@ Override
Public void call (Iterator It throws the Exception {
If (XXXXX) {//conditions, such as traverse the partition of the data to get a certain feature
New RealForeachPartitionFunc1 () call (it);//perform real foreachPartition operator
} else {
New RealForeachPartitionFunc2 () call (it);
}
}
});

CodePudding user response:

Custom partitions, RDD. PartitionBy custom partition (/* */new TestPartitioner ()). ForeachPartition {/* to the operation of the data in partition code */}

The class TestPartitioner extends Partitioner {

//redis partition number
Override def numPartitions: Int=???

Override def getPartition (key: Any) : Int={
/* redis zoning rules in the */
}
}

CodePudding user response:

Custom partition
  • Related