Home > OS >  Column bind two RDD in scala spark without KEYs
Column bind two RDD in scala spark without KEYs

Time:03-31

The two RDDs have the same number of rows. I am searching for the R's equivalent to cbind()

It seems join() always requires a key.

CodePudding user response:

Closest is .zip method. With appropriate subsequent .map usage. E.g.:

val rdd0 = sc.parallelize(Seq( (1, (2,3)), (2, (3,4)) ))
val rdd1 = sc.parallelize(Seq( (200,300), (300,400) ))
val zipRdd = (rdd0 zip rdd1).collect

returns:

zipRdd: Array[((Int, (Int, Int)), (Int, Int))] = Array(((1,(2,3)),(200,300)), ((2,(3,4)),(300,400)))

Indeed based on k,v with same num rows required.

  • Related