Home > other >  Format for dates in pyspark
Format for dates in pyspark

Time:09-17

I have a data frame with a date column where there are different type of date formats present. I want to validate and extract only the records with date format 'MM-dd-yy' but when I validate and extract I am also getting the records with format ''MM-dd-yyyy'. What is the correct format to extract the records with format ''MM-dd-yy'?

from pyspark.sql import functions as F,Window
df = sc.parallelize([['12-21-20'],
                     ['05-30-2020'],
                     ['01-01-1984'],
                     ['12-24-20']]).toDF(["Date"])

df.show()
 ---------- 
|      Date|
 ---------- 
|  12-21-20|
|05-30-2020|
|01-01-1984|
|  12-24-20|
 ---------- 
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
df=df.filter(~F.col("Date").isNotNull()|
                                           to_date(F.col("Date"),'MM-dd-yy').isNotNull()) 
display(df)
 ---------- 
|      Date|
 ---------- 
|  12-21-20|
|05-30-2020|
|01-01-1984|
|  12-24-20|
 ---------- 

Expected output:

 ---------- 
|      Date|
 ---------- 
|  12-21-20|
|  12-24-20|
 ---------- 

CodePudding user response:

I found a simple way not using to_date but rather regexp_extract

df = df.filter((F.regexp_extract('Date', '(\d{2}-\d{2}-\d{2})$', 1) != '') & 
               (F.to_date('Date', 'MM-dd-yy').isNotNull()))
df.show()

 -------- 
|    Date|
 -------- 
|12-21-20|
|12-24-20|
 -------- 

CodePudding user response:

You can specify format as MM-dd-yyyy, it will append 00 to the year with 2 digits and you can filter based on that.

df.where(to_date(F.col("Date"),'MM-dd-yyyy').rlike('^00[1-9] ') &
to_date(F.col("Date"),'MM-dd-yyyy').isNotNull()).show()

 -------- 
|    Date|
 -------- 
|12-21-20|
|12-24-20|
 -------- 

CodePudding user response:

using rlike and to_date

>>> df.filter((f.col("Date").rlike('^\d{2}-\d{2}-\d{2}$')) 
            & (f.to_date(f.col("Date"), "MM-dd-yy")).isNotNull()).show()
 -------- 
|    Date|
 -------- 
|12-21-20|
|12-24-20|
 -------- 
  • Related