I am having a Date in my dataframe in String Datatype with format - dd/MM/yyyy
as below:
When I am trying to convert the string to date format, all the functions are returning null values.
Looking to convert the datatype to DateType
.
CodePudding user response:
It looks like your date strings contain quotes, you need to remove them, using for example regexp_replace
, before calling to_date
:
import pyspark.sql.functions as F
df = spark.createDataFrame([("'31-12-2021'",), ("'30-11-2021'",), ("'01-01-2022'",)], ["Birth_Date"])
df = df.withColumn(
"Birth_Date",
F.to_date(F.regexp_replace("Birth_Date", "'", ""), "dd-MM-yyyy")
)
df.show()
# ----------
#|Birth_Date|
# ----------
#|2021-12-31|
#|2021-11-30|
#|2022-01-01|
# ----------