I have one nested array of struct and I would like to modify column name to something else as given in example below.
Source format
|-- HelloWorld: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- version: string (nullable = true)
| | |-- abc-version: string (nullable = true) ----->This part needs to renamed
| | |-- again_something: array (nullable = true)
| | | |-- element: map (containsNull = true)
| | | | |-- key: string
| | | | |-- value: string (valueContainsNull = true)
Output format should look like below.
|-- HelloWorld: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- version: string (nullable = true)
| | |-- abc_version: string (nullable = true). ----->This part has changed
| | |-- again_something: array (nullable = true)
| | | |-- element: map (containsNull = true)
| | | | |-- key: string
| | | | |-- value: string (valueContainsNull = true)
I tried different withField, F.expr to transform the column name, but didn't really work well.
Please help.
CodePudding user response:
I would recast it with the same dtype while changing the column name
df3 = df.withColumn("HelloWorld",F.expr("transform(HelloWorld, x -> struct(cast((x['abc-version']) as integer) as abc_version, x.version,x.gain_something))"))
root
|-- HelloWorld: array (nullable = true)
| |-- element: struct (containsNull = false)
| | |-- abc_version: integer (nullable = true)
| | |-- version: string (nullable = true)
| | |-- gain_something: array (nullable = true)
| | | |-- element: map (containsNull = true)
| | | | |-- key: string
| | | | |-- value: string (valueContainsNull = true)