I have a dataframe containing an array of structs. I would like to add the index of the array as a field within the struct. Is this possible?
So structure would go from:
|-- my_array_column: array
| |-- element: struct
| | |-- field1: string
| | |-- field2: string
to:
|-- my_array_column: array
| |-- element: struct
| | |-- field1: string
| | |-- field2: string
| | |-- index of element: integer
Many thanks
CodePudding user response:
For Spark 3.1 , you can use transform
function and withField
to update each struct element of the array column like his:
from pyspark.sql import functions as F
df = df.withColumn(
"my_array_column",
F.transform("my_array_column", lambda x, i: x.withField("index", i))
)
For older version, you'll have to recreate the whole struct element in order to add a field:
df = df.withColumn(
"my_array_column",
F.expr("transform(my_array_column, (x, i) -> struct(x.field1 as field1, x.field2 as field2, i as index))")
)