I have a StructType as follows:
to_Schema = StructType([StructField('name', StringType(), True),
StructField('sales', IntegerType(), True)])
The dataframe_1
has both fields as StringType. So I created the above StructType so that I could use it to typecast the fields in dataframe_1
.
I am able to do it in Scala:
val df2 = dataframe_1.selectExpr(to_Schema.map(
col => s"CAST ( ${col.name} As ${col.dataType.sql}) ${col.name}"
): _*)
I am not able to use the same map
function in python as StructType has no map
function.
I've tried using for
loop but it doesn't work as expected.
I am looking for a PySpark equivalent of the above Scala code.
CodePudding user response:
The below code will achieve the same thing in python:
for s in to_Schema:
df = df.withColumn(s.name, df[s.name].cast(s.dataType))
You can also create a new dataframe from the old one using the new schema as shown in this answer:
df2 = spark.createDataFrame(dataframe_1.rdd, to_Schema)
CodePudding user response:
This would be the direct translation:
df2 = dataframe_1.selectExpr(*[f"CAST ({c.name} AS {c.dataType.simpleString()}) {c.name}" for c in to_Schema])
It could be simplified:
df2 = dataframe_1.select([col(c.name).cast(c.dataType).alias(c.name) for c in to_Schema])
However, I like this answer more ;)