Home > Net >  Parse date with slashes in Pyspark
Parse date with slashes in Pyspark

Time:08-19

There is pyspark dataframe like:

df = spark.createDataFrame([('7/1/20', )],['DATE'])
df.show()
 ------ 
|  DATE|
 ------ 
|7/1/20|
 ------ 

Note! data format is (m)m/(d)d/yy, like: 1/25/20, 4/5/20, 11/2/20 etc.

I tried dessisions is here, but it returns empty dataframe. Example:

df = df.withColumn("DATE_1",F.to_date(F.col("DATE"),"%m/%d/%y"))
df.show()

But it returns Nan:

 ------ ------ 
|  DATE|DATE_1|
 ------ ------ 
|7/1/20|  null|
 ------ ------ 

What can I do for parsing dates like this?

CodePudding user response:

I'm using pyspark==3.2.1 and needed to set the configuration for spark.sql.legacy.timeParserPolicy to LEGACY to use the following solution:

spark.conf.set('spark.sql.legacy.timeParserPolicy', 'LEGACY')
import pyspark.sql.functions as f
df = spark.createDataFrame([
    ('7/1/20',)
], ['Date'])

df = (
    df.withColumn('Date_1', f.to_date(f.col('Date'), 'MM/dd/yy'))
)

df.show(truncate= False)

output:

 ------ ----------                                                              
|Date  |Date_1    |
 ------ ---------- 
|7/1/20|2020-07-01|
 ------ ---------- 

And if you don't want to set the configuration for spark.sql.legacy.timeParserPolicy to LEGACY, you can use this solution:

import pyspark.sql.functions as f
df = spark.createDataFrame([
    ('7/1/20',),
    ('10/1/20',),
    ('7/10/20',),
    ('10/10/20',)
], ['Date'])

df = (
    df
    .withColumn('Date_Converted', f.regexp_replace(f.col('Date'), '^([0-9]{1}/)', '0$1'))
    .withColumn('Date_Converted', f.regexp_replace(f.col('Date_Converted'), '/([0-9]{1}/)', '/0$1'))
    .withColumn('Date_1', f.to_date(f.col('Date_Converted'), 'MM/dd/yy'))
)

output:

 -------- -------------- ----------                                             
|Date    |Date_Converted|Date_1    |
 -------- -------------- ---------- 
|7/1/20  |07/01/20      |2020-07-01|
|10/1/20 |10/01/20      |2020-10-01|
|7/10/20 |07/10/20      |2020-07-10|
|10/10/20|10/10/20      |2020-10-10|
 -------- -------------- ---------- 
  • Related