I need to create a new column in PySpark Dataframe. However the condition to create this new column will be dynamic
example:
df = df.withColumn(
'update_date',
to_date(
substring(df['update_date_string'], -8, 8),
'MM-dd-yy',
),
)
To be converted to
column_expression = "to_date(
substring(df['update_date_string'], -8, 8),
'MM-dd-yy',
)"
df = df.withColumn(
'update_date',
expr(column_expression )
)
The second code with expr() is not creating the new column. Please suggest how this could be resolved.
CodePudding user response:
In expr() you need pass SQL expression, not python (Docs: https://sparkbyexamples.com/pyspark/pyspark-sql-expr-expression-function/). Try
column_expression = "to_date(
substring(update_date_string, -8, 8),
'MM-dd-yy')"
df = df.withColumn(
'update_date',
expr(column_expression )
)