I have a date value in a column of string type that takes this format:
06-MAY-16 09.17.15
I want to convert it to this format:
20160506
I have tried using DATE_FORMAT(TO_DATE(<column>), 'yyyyMMdd')
but a NULL
value is returned.
Does anyone have any ideas about how to go about doing this in pyspark or spark SQL?
Thanks
CodePudding user response:
I've got it! This is the code I used which seems to have worked:
FROM_UNIXTIME(UNIX_TIMESTAMP(<column>, 'dd-MMM-yy HH.mm.ss'), 'yyyyMMdd')
Hope this helps others!
CodePudding user response:
Your original attempt is close to the solution. You just needed to add the format in the TO_DATE() function. This will work as well:
DATE_FORMAT(TO_DATE(<col>, 'dd-MMM-yy HH.mm.ss'), 'yyyyMMdd')
And for pyspark:
import pyspark.sql.functions as F
df = df.withColumn('<col>', F.date_format(F.to_date(F.col('<col>'), 'dd-MMM-yy HH.mm.ss'), 'yyyyMMdd'))
CodePudding user response:
Convert your string to a date before you try to 'reformat' it.
- Convert pyspark string to date format --
to_timestamp(df.t, 'dd-MMM-YY HH.mm.ss').alias('my_date')
- Pyspark date yyyy-mmm-dd conversion --
date_format(col("my_date"), "yyyyMMdd")