Home > OS >  how to add a new element in RDD
how to add a new element in RDD

Time:08-09

I have a spark pair RDD (key, count) as below

Array[(String, Int)] = Array((a,1), (b,2), (c,3), (d,4))

i want to add a new max element in RDDs

Array[(String, Int)] = Array((a,1,4), (b,2,4), (c,3,4), (d,4,4))

CodePudding user response:

In the definition, you are saying:

(a, 1) -> (a, 1, 4)

(b, 2) -> (b, 2, 4)

(c, 1) -> (c, 3, 4) where is the 3 coming from now?

(d, 3) -> (d, 4, 4) where is the 4 coming from now?

In case your new max is the maximum value of your value RDD plus one, then you can sort descending by value and get the value of the first element:

val max = df1.sortBy(_._2, ascending = false).collect(0)._2   1 

val df2 = df1.map(r => (r._1, r._2, max)

This gives:

(a,1,4)
(b,2,4)
(c,1,4)
(d,3,4)

which should be what you want.

  • Related