Home > database >  How to cast a string column to date having two different types of date formats in Pyspark
How to cast a string column to date having two different types of date formats in Pyspark

Time:11-29

I have a dataframe column which is of type string and has dates in it. I want to cast the column from string to date but the coolumn contains two types of date formats.

I tried using the to_date fuction but it is not working as expected and giving null values after applying function.

Below are the two date formats I am getting in the df col(datatype - string) enter image description here

I tried applying the to_date function and below are the results enter image description here

enter image description here

Please let me know how we can solve this issue and get the date column in only one format

Thanks in advance

CodePudding user response:

You can use pyspark.sql.functions.coalesce to return the first non-null result in a list of columns. So the trick here is to parse using multiple formats and take the first non-null one:

from pyspark.sql import functions as F

df = spark.createDataFrame([
    ("9/1/2022",),
    ("2022-11-24",),
], ["Alert Release Date"])


x = F.col("Alert Release Date")

df.withColumn("date", F.coalesce(F.to_date(x, "M/d/yyyy"), F.to_date(x, "yyyy-MM-dd"))).show()
 ------------------ ---------- 
|Alert Release Date|      date|
 ------------------ ---------- 
|          9/1/2022|2022-09-01|
|        2022-11-24|2022-11-24|
 ------------------ ---------- 
  • Related