I wrote code that transforms string into array of structs. I would like to do the same in python. Do you have any clue how can I do it?
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Column
val df: DataFrame = Seq(
"adserviceCalculateCpcAlgorithmV1:2;searchProductsDecorator:3;searchOffersDecorator:3;bundlediscounts:5;searchGridType:3"
).toDF("abTests")
display(
df
.withColumn("abTestsArr", split($"abTests", ";"))
.withColumn("abTestsArr",
transform(col("abTestsArr"), (c: Column) => {
struct(
split(c, ":").getItem(0) as "name",
split(c, ":").getItem(1) as "group"
)
})
)
)
CodePudding user response:
You would do the same in Python using lambda expression as the second parameter for transform
function:
from pyspark.sql import functions as F
df.withColumn(
"abTestsArr",
F.transform(
F.split("abTests", ";"), lambda x: F.struct(
F.substring_index(x, ":", 1).alias("name"),
F.substring_index(x, ":", -1).alias("group")
)
)
).show(truncate=False)
Instead of parsing it yourself, you could also consider using str_to_map
to get a MapType column then convert into array of structs using map_entries
function:
df.withColumn(
"abTestsMap",
F.map_entries(F.expr("str_to_map(abTests, ';', ':')"))
).show(truncate=False)