Home > Mobile >  Spark - Update a nested column to string
Spark - Update a nested column to string

Time:04-28

 |-- x: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- y: struct (nullable = true)
 |    |    |-- z: struct (nullable = true)
 |    |    |    |-- aa: string (nullable = true)

I have the above nested schema where I want to change column z from struct to string.

 |-- x: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- y: struct (nullable = true)
 |    |    |-- z: string (nullable = true)

I'm not using Spark 3 but Spark 2.4.x. Will prefer Scala way but python works too since this is a one time manual thing to backfill some past data.

Is there a way to do this with some udf or any other way?

I know it's easy to do this via to_json but the nested array of struct is causing issues.

CodePudding user response:

For your specific case, you can do it with built-in functions on Spark 2.4 or Spark 3.0

Spark 2.4

You can use arrays_zip as follows:

  • first, you create arrays for each field you want to have as struct element of your array
  • second, you use arrays_zip to zip those fields

Here is the complete code, with df your input dataframe:

import org.apache.spark.functions.{arrays_zip, col}

df.withColumn("x",
      arrays_zip(
        col("x").getField("y").alias("y"),
        col("x").getField("z").getField("aa").alias("z")
      ))

Spark 3.0

You can use transform to rebuild element struct of your array, as follows:

df.withColumn("x", transform(
      col("x"),
      element => struct(
        element.getField("y").alias("y"),
        element.getField("z").getField("aa").alias("z")
      )
    ))

CodePudding user response:

cast as in higher order function

df3=df.withColumn('x', expr('transform(x, s-> struct(s.y,cast(s.z as string) as z))')).printSchema()

root
 |-- x: array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- y: struct (nullable = true)
 |    |    |-- z: string (nullable = true)
  • Related