String> () {
@ Override
Public String call (Tuple2 & lt; String, String> Arg0) throws the Exception {
String lines.=arg0 _1 () + "" + arg0) _2. The split (" -") [0] + "" + arg0) _2. The split (" -") [1];
Return lines;
}
});
System. The out. Println (rdd3. Collect ());
Enter
Num 10 20
Num 11 22
The name of 22 33
CMJ 21 332
O
[CMJ 21st 332, num 21 42, the name of 22 33] CodePudding user response:
Don't have to string concatenation, use the map change my first type is good
scala> Val cm=c. ap (e=& gt; (e. _1, (e) _2, 0)))
Cm: org. Apache. Spark. RDD. RDD [(String, (Int, Int))]=MapPartitionsRDD [25] at the map at & lt; Console> 23:
Scala> Val cr=cm. ReduceByKey ((e1 and e2)=& gt; (e1) _1 + e2) _1, e1. _1/2 + e2. _1/2))
Cr: org. Apache. Spark. RDD. RDD [(String, (Int, Int))]=ShuffledRDD [26] at reduceByKey ats & lt; Console> : 25
Scala> Val cz=cr. The map (e=& gt; (e. e. _1, _2. _1, e. _2) _2))
Cz: org. Apache. Spark. RDD. RDD [(String, Int, Int)]=MapPartitionsRDD [27] at the map at & lt; Console> 27:
Scala> Cz. Collect
Res15: Array [(String, Int, Int)]=Array ((b, 3, 1), (a, 6, 2), (c, 1, 0))
Scala> Val c=sc. Parallelize (List ((" a ", 1), (" a ", 2), (" a ", 3), (" b ", 1), (" b ", 2), (" c ", 1)))
C: org. Apache. Spark. RDD. RDD [(String, Int)]=ParallelCollectionRDD [28] at parallelize the at & lt; Console> : 21
CodePudding user response:
ReduceByKey ((x, y)=& gt; (x) _1 + y) _1, x. _2 + y. _2)) CodePudding user response:
ReduceByKey ((x, y)=& gt; (x) _1 + x) _2, y. _1 + y) _2), 10) CodePudding user response:
5/f