Home > Software engineering >  How to create same array of structs from string in pyspark?
How to create same array of structs from string in pyspark?

Time:08-03

I wrote code that transforms string into array of structs. I would like to do the same in python. Do you have any clue how can I do it?

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Column


val df: DataFrame = Seq(
  "adserviceCalculateCpcAlgorithmV1:2;searchProductsDecorator:3;searchOffersDecorator:3;bundlediscounts:5;searchGridType:3"
).toDF("abTests")

display(
  df
    .withColumn("abTestsArr", split($"abTests", ";"))
    .withColumn("abTestsArr", 
      transform(col("abTestsArr"), (c: Column) => {
        struct(
          split(c, ":").getItem(0) as "name",
          split(c, ":").getItem(1) as "group"
        )
      }) 
    )
)

CodePudding user response:

You would do the same in Python using lambda expression as the second parameter for transform function:

from pyspark.sql import functions as F

df.withColumn(
    "abTestsArr",
    F.transform(
        F.split("abTests", ";"), lambda x: F.struct(
            F.substring_index(x, ":", 1).alias("name"),
            F.substring_index(x, ":", -1).alias("group")
        )
    )
).show(truncate=False)

Instead of parsing it yourself, you could also consider using str_to_map to get a MapType column then convert into array of structs using map_entries function:

df.withColumn(
    "abTestsMap", 
    F.map_entries(F.expr("str_to_map(abTests, ';', ':')"))
).show(truncate=False)
  • Related