The spark reducebykey calculate the value of multiple value? As shown in figure-CodePudding

CodePudding user response:

The

refer to the original poster Yt_Sports response:

Reducebykey is key - in the form of value, you here are three yuan a tuple, you can then deal with again after a row reducebykey

CodePudding user response:

JavaPairRDD Rdd1=lines. MapToPair (new PairFunction () {

@ Override
Public Tuple2 & lt; String, String> Call (String arg0) throws the Exception {
String temp=arg0. Split (" ") [0];
String temp2=arg0. Split (" ") [1];
String temp3=arg0. Split (" ") [2];
Return new Tuple2 & lt; String, String> (temp, temp2 + + temp3 "-");

}
});

JavaPairRDD Rdd2=rdd1. ReduceByKey (new Function2 & lt; String, the String, String> () {

@ Override
Public String call (String arg0, String arg1) throws the Exception {
Int a=Integer. ParseInt (arg0. Split (" - ") [0]).
Int a2=Integer. ParseInt (arg1. Split (" - ") [0]).
Aa=String String. The valueOf (a + a2);
Int b=Integer. ParseInt (arg0. Split (" - ") [1]).
Int b2=Integer. ParseInt (arg1. Split (" - ") [1]).
String bb=String. The valueOf (b + b2);
The return of aa + "-" + bb;
}
});

JavaRDD Rdd3=rdd2. The map (new Function String> () {

@ Override
Public String call (Tuple2 & lt; String, String> Arg0) throws the Exception {
String lines.=arg0 _1 () + "" + arg0) _2. The split (" -") [0] + "" + arg0) _2. The split (" -") [1];
Return lines;
}
});

System. The out. Println (rdd3. Collect ());

Enter
Num 10 20
Num 11 22
The name of 22 33
CMJ 21 332
O
[CMJ 21st 332, num 21 42, the name of 22 33]

CodePudding user response:

Don't have to string concatenation, use the map change my first type is good

 scala> Val cm=c. ap (e=& gt; (e. _1, (e) _2, 0))) 
Cm: org. Apache. Spark. RDD. RDD [(String, (Int, Int))]=MapPartitionsRDD [25] at the map at & lt; Console> 23: 

Scala> Val cr=cm. ReduceByKey ((e1 and e2)=& gt; (e1) _1 + e2) _1, e1. _1/2 + e2. _1/2)) 
Cr: org. Apache. Spark. RDD. RDD [(String, (Int, Int))]=ShuffledRDD [26] at reduceByKey ats & lt; Console> : 25 

Scala> Val cz=cr. The map (e=& gt; (e. e. _1, _2. _1, e. _2) _2)) 
Cz: org. Apache. Spark. RDD. RDD [(String, Int, Int)]=MapPartitionsRDD [27] at the map at & lt; Console> 27: 

Scala> Cz. Collect 
Res15: Array [(String, Int, Int)]=Array ((b, 3, 1), (a, 6, 2), (c, 1, 0)) 

Scala> Val c=sc. Parallelize (List ((" a ", 1), (" a ", 2), (" a ", 3), (" b ", 1), (" b ", 2), (" c ", 1))) 
C: org. Apache. Spark. RDD. RDD [(String, Int)]=ParallelCollectionRDD [28] at parallelize the at & lt; Console> : 21

CodePudding user response:

ReduceByKey ((x, y)=& gt; (x) _1 + y) _1, x. _2 + y. _2))

CodePudding user response:

ReduceByKey ((x, y)=& gt; (x) _1 + x) _2, y. _1 + y) _2), 10)

CodePudding user response:

5/f

reference check fish response:

reduceByKey ((x, y)=& gt; (x) _1 + x) _2, y. _1 + y) _2), 10)

Write wrong, it should be:
ReduceByKey ((x, y)=& gt; (x) _1 + y) _1, x. _2 + y. _2))
Upstairs is right

CodePudding user response:

The basic idea:
Num 10 to 20 (num, (10, 20)) key for num
Summation, then carried out in accordance with the key (key, (10 + 10, 20 + 20)
The map (x=& gt; (x), 1 (2, x. x. (3))). ReduceBykey ((s, h)=& gt; _1 + (s. h. _1, s. _2 + h. _2))