Home > Blockchain >  map function on StructType in PySpark
map function on StructType in PySpark

Time:08-08

I have a StructType as follows:

to_Schema = StructType([StructField('name', StringType(), True),
           StructField('sales', IntegerType(), True)])

The dataframe_1 has both fields as StringType. So I created the above StructType so that I could use it to typecast the fields in dataframe_1.

I am able to do it in Scala:

val df2 = dataframe_1.selectExpr(to_Schema.map(
  col => s"CAST ( ${col.name} As ${col.dataType.sql}) ${col.name}"
): _*)

I am not able to use the same map function in python as StructType has no map function.

I've tried using for loop but it doesn't work as expected.

I am looking for a PySpark equivalent of the above Scala code.

CodePudding user response:

The below code will achieve the same thing in python:

for s in to_Schema:
    df = df.withColumn(s.name, df[s.name].cast(s.dataType))

You can also create a new dataframe from the old one using the new schema as shown in this answer:

df2 = spark.createDataFrame(dataframe_1.rdd, to_Schema)

CodePudding user response:

This would be the direct translation:

df2 = dataframe_1.selectExpr(*[f"CAST ({c.name} AS {c.dataType.simpleString()}) {c.name}" for c in to_Schema])

It could be simplified:

df2 = dataframe_1.select([col(c.name).cast(c.dataType).alias(c.name) for c in to_Schema])

However, I like this answer more ;)

  • Related