Pyspark: Convert Column from String Numbers to Timestamp Type-CodePudding

I am looking to convert Numbers in string format to timestamp.

Sample:

 -------- ------------------- 
|date_str|expected date      |
 -------- ------------------- 
|1.63E 15|1991-11-25 13:39:00|
|1.63E 15|1991-11-25 13:40:00|
|1.63E 15|1991-11-25 13:41:00|
 -------- -------------------

I have tried using to_timestamp() but returning Null values. also converted to unix_timestamp but no luck.

sdf1.select(F.to_timestamp(sdf1.date_str, 'yyyy/MM/dd HH:mm:ss').alias('date')).show()
sdf1.select(to_timestamp('date_str', 'dd/MM/yyyy HH:mm:ss')).show()

CodePudding user response：

First of all, the transformation that you ask is not possible. The same input cannot produce different outpus. It should be :

 -------- ------------------- 
|date_str|expected date      |
 -------- ------------------- 
|1.63E 15|1991-11-25 13:39:00|
|1.63E 15|1991-11-25 13:39:00|
|1.63E 15|1991-11-25 13:39:00|
 -------- -------------------

Then, the function you need is probably from_unixtime - converting a timestamp in numeric format to timestamp in string/timestamp format :

time_df = spark.createDataFrame([(1428476400,)], ['unix_time'])

time_df.select(from_unixtime('unix_time').alias('ts')).collect()
# [Row(ts='2015-04-08 00:00:00')]

The only probleme is that, in your example, your numbers are 16-digits long (E 15) which is too much precision for a unix timestamp. You should probably divide it by 1000000.

from pyspark.sql import functions as F, types as T

df.withColumn(
    "date_num", F.col("date_str").cast(T.DecimalType(16, 0)) / 1000000
).withColumn(
    "date", F.from_unixtime("date_num")
).show()
 -------- ------------------- ------------------- 
|date_str|           date_num|               date|
 -------- ------------------- ------------------- 
|1.63E 15|1630000000.00000000|2021-08-26 17:46:40|
 -------- ------------------- -------------------