I have followed this solution for one hot encoding. Now I want the last variable in my array (which is an array of integers) to change so that I get individual columns for each one hot-encoded variable.
My current RDD is:
scala> encode_cars
res2: org.apache.spark.rdd.RDD[(Double, Double, Double, Double, Array[Int])] = MapPartitionsRDD[17] at map at <console>:27
and I ideally I would want something like:
res2: org.apache.spark.rdd.RDD[(Double, Double, Double, Double, Int, Int, Int, Int, Int, Int, Int)] = MapPartitionsRDD[17] at map at <console>:27
I know that this could be done using a map
/ flatmap
but I am not sure how to do it.
CodePudding user response:
I found an easy solution by just indexing the array and using the map
function:
encode_cars.map(x => (x._1, x._2, x._3, x._4, x._5(1), x._5(2), x._5(3))