Home > Software design >  Unable to format timestamp in pyspark
Unable to format timestamp in pyspark

Time:05-30

I have a CSV data like the below:

time_value,annual_salary
5/01/2019 1:02:16,120.56
06/01/2019 2:02:17,12800
7/01/2019 03:02:18,123.00
08/01/2019 4:02:19,123isdhad  

Now, I want to convert to the timestamp column. So, I created a view out of these records and tried to convert it but it throws an error:

spark.sql("select to_timestamp(time_value,'M/dd/yyyy H:mm:ss') as time_value from table")  

Error:

Text '5/1/2019 1:02:16' could not be parsed

CodePudding user response:

Sid,

According to the error that i am seeing there, this is concerning the Date Format issue.

Text '5/1/2019 1:02:16' could not be parsed

But your time format is specific as such 'M/dd/yyyy H:mm:ss') You can see that the day specific is /1/ but your format is dd which expects two digits.

Please try the following format

'M/d/yyyy H:mm:ss'

CodePudding user response:

I tried your SQL no problem. It may be a problem with the spark version. I used 2.4.8

screenshot here

CodePudding user response:

Pay attention to the fact that there is a changement in the format with Spark 3.0 (https://spark.apache.org/docs/3.0.0-preview2/sql-migration-guide.html#:~:text=To parse the same timestamp,SSS .)

Unfortunately I don't have the time to run the code locally and explain in detail what is wrong but I remember having a similar problem when following a Spark 2 tutorial with Spark 3.0 running.

  • Related