I am new to Apache Spark, and am not able to get this to work.
I have an RDD of the form (Int,(Int,Int)), and would like to sum up the first element of the value while appending the second element.
For example, I have the following RDD:
[(5,(1,0)), (5,(1,2)), (5,(1,5)))]
And I want to be able to get something like this:
(5,3,(0,2,5))
I tried this:
sampleRdd.reduceByKey{case(a,(b,c)) => (a b)}
But I get this error:
type mismatch;
[error] found : Int
[error] required: String
[error] .reduceByKey{case(a,(b,c)) => (a b)}
[error] ^
How can I achieve this?
CodePudding user response:
Please try this
def seqOp = (accumulator: (Int, List[String]), element: (Int, Int)) =>
(accumulator._1 element._1, accumulator._2 : element._2.toString)
def combOp = (accumulator1: (Int, List[String]), accumulator2: (Int, List[String])) => {
(accumulator1._1 accumulator2._1, accumulator1._2 ::: accumulator2._2)
}
val zeroVal = ((0, List.empty[String]))
rdd.aggregateByKey(zeroVal)(seqOp, combOp).collect