Home > database >  Modifying element in nested array of struct
Modifying element in nested array of struct

Time:04-20

I have one nested array of struct and I would like to modify column name to something else as given in example below.

Source format

 |-- HelloWorld: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- version: string (nullable = true)
 |    |    |-- abc-version: string (nullable = true) ----->This part needs to renamed
 |    |    |-- again_something: array (nullable = true)
 |    |    |    |-- element: map (containsNull = true)
 |    |    |    |    |-- key: string
 |    |    |    |    |-- value: string (valueContainsNull = true)

Output format should look like below.

 |-- HelloWorld: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- version: string (nullable = true)
 |    |    |-- abc_version: string (nullable = true). ----->This part has changed
 |    |    |-- again_something: array (nullable = true)
 |    |    |    |-- element: map (containsNull = true)
 |    |    |    |    |-- key: string
 |    |    |    |    |-- value: string (valueContainsNull = true)

I tried different withField, F.expr to transform the column name, but didn't really work well.

Please help.

CodePudding user response:

I would recast it with the same dtype while changing the column name

 df3 = df.withColumn("HelloWorld",F.expr("transform(HelloWorld, x -> struct(cast((x['abc-version']) as integer) as abc_version, x.version,x.gain_something))"))


root
 |-- HelloWorld: array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- abc_version: integer (nullable = true)
 |    |    |-- version: string (nullable = true)
 |    |    |-- gain_something: array (nullable = true)
 |    |    |    |-- element: map (containsNull = true)
 |    |    |    |    |-- key: string
 |    |    |    |    |-- value: string (valueContainsNull = true)
  • Related