Home > Net >  How to change an array of integers to individual columns in Spark (scala)?
How to change an array of integers to individual columns in Spark (scala)?

Time:04-12

I have followed this solution for one hot encoding. Now I want the last variable in my array (which is an array of integers) to change so that I get individual columns for each one hot-encoded variable.

My current RDD is:

scala> encode_cars
res2: org.apache.spark.rdd.RDD[(Double, Double, Double, Double, Array[Int])] = MapPartitionsRDD[17] at map at <console>:27

and I ideally I would want something like:

res2: org.apache.spark.rdd.RDD[(Double, Double, Double, Double, Int, Int, Int, Int, Int, Int, Int)] = MapPartitionsRDD[17] at map at <console>:27

I know that this could be done using a map / flatmap but I am not sure how to do it.

CodePudding user response:

I found an easy solution by just indexing the array and using the map function:

encode_cars.map(x => (x._1, x._2, x._3, x._4, x._5(1), x._5(2), x._5(3))
  • Related