I have a data frame with schema like below
root
|-- date: timestamp (nullable = true)
|-- questionAnswerList: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- questionNumber: string (nullable = true)
| | |-- listAnswers: array (nullable = true)
| | | |-- element: string(containsNull = true)
And i want to add a new field inside the array of struct like the schema below
root
|-- date: timestamp (nullable = true)
|-- questionAnswerList: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- index: integer (nullable = true)
| | |-- questionNumber: string (nullable = true)
| | |-- listAnswers: array (nullable = true)
| | | |-- element: string(containsNull = true)
I tried to use a UDF like below
val addIndexInStruct: UserDefinedFunction = udf((data: Seq[Row]) => {
data.zipWithIndex.map{case (Row(x:String,y:Array[String]), index) => (index, x, y )}
})
df.withColumn("newCol",addIndexInStruct($"questionAnswerList")).show(false)
But i have the following error :
Caused by: scala.MatchError: ([Q10,WrappedArray(R10.1, R10.2)],0) (of class scala.Tuple2)
Anybody has an idea how to do this in spark 2.X ? I saw in others posts that in spark 3.X, transform function can be used
CodePudding user response:
I finally solved it. Seq had to be used instead of Array in the pattern matching part
val addIndexInStruct: UserDefinedFunction = udf((data: Seq[Row]) => {
data.zipWithIndex.map{case (Row(x: String,y: Seq[String]), index) => (index, x, y )}
})