I have a spark pair RDD (key, count) as below
Array[(String, Int)] = Array((a,1), (b,2), (c,3), (d,4))
i want to add a new max element in RDDs
Array[(String, Int)] = Array((a,1,4), (b,2,4), (c,3,4), (d,4,4))
CodePudding user response:
In the definition, you are saying:
(a, 1)
-> (a, 1, 4)
(b, 2)
-> (b, 2, 4)
(c, 1)
-> (c, 3, 4)
where is the 3 coming from now?
(d, 3)
-> (d, 4, 4)
where is the 4 coming from now?
In case your new max
is the maximum value of your value RDD
plus one, then you can sort descending by value and get the value of the first element:
val max = df1.sortBy(_._2, ascending = false).collect(0)._2 1
val df2 = df1.map(r => (r._1, r._2, max)
This gives:
(a,1,4)
(b,2,4)
(c,1,4)
(d,3,4)
which should be what you want.