Home > Software design >  Convert string to date in pyspark
Convert string to date in pyspark

Time:01-10

I have a date value in a column of string type that takes this format:

06-MAY-16 09.17.15

I want to convert it to this format:

20160506

I have tried using DATE_FORMAT(TO_DATE(<column>), 'yyyyMMdd') but a NULL value is returned.

Does anyone have any ideas about how to go about doing this in pyspark or spark SQL?

Thanks

CodePudding user response:

I've got it! This is the code I used which seems to have worked:

FROM_UNIXTIME(UNIX_TIMESTAMP(<column>, 'dd-MMM-yy HH.mm.ss'), 'yyyyMMdd')

Hope this helps others!

CodePudding user response:

Your original attempt is close to the solution. You just needed to add the format in the TO_DATE() function. This will work as well:

DATE_FORMAT(TO_DATE(<col>, 'dd-MMM-yy HH.mm.ss'), 'yyyyMMdd')

And for pyspark:

import pyspark.sql.functions as F
df = df.withColumn('<col>', F.date_format(F.to_date(F.col('<col>'), 'dd-MMM-yy HH.mm.ss'), 'yyyyMMdd'))

CodePudding user response:

Convert your string to a date before you try to 'reformat' it.

  • Related