I am looking to convert Numbers in string format to timestamp.
Sample:
-------- -------------------
|date_str|expected date |
-------- -------------------
|1.63E 15|1991-11-25 13:39:00|
|1.63E 15|1991-11-25 13:40:00|
|1.63E 15|1991-11-25 13:41:00|
-------- -------------------
I have tried using to_timestamp()
but returning Null values. also converted to unix_timestamp
but no luck.
sdf1.select(F.to_timestamp(sdf1.date_str, 'yyyy/MM/dd HH:mm:ss').alias('date')).show()
sdf1.select(to_timestamp('date_str', 'dd/MM/yyyy HH:mm:ss')).show()
CodePudding user response:
First of all, the transformation that you ask is not possible. The same input cannot produce different outpus. It should be :
-------- -------------------
|date_str|expected date |
-------- -------------------
|1.63E 15|1991-11-25 13:39:00|
|1.63E 15|1991-11-25 13:39:00|
|1.63E 15|1991-11-25 13:39:00|
-------- -------------------
Then, the function you need is probably from_unixtime
- converting a timestamp in numeric format to timestamp in string/timestamp format :
time_df = spark.createDataFrame([(1428476400,)], ['unix_time'])
time_df.select(from_unixtime('unix_time').alias('ts')).collect()
# [Row(ts='2015-04-08 00:00:00')]
The only probleme is that, in your example, your numbers are 16-digits long (E 15
) which is too much precision for a unix timestamp. You should probably divide it by 1000000.
from pyspark.sql import functions as F, types as T
df.withColumn(
"date_num", F.col("date_str").cast(T.DecimalType(16, 0)) / 1000000
).withColumn(
"date", F.from_unixtime("date_num")
).show()
-------- ------------------- -------------------
|date_str| date_num| date|
-------- ------------------- -------------------
|1.63E 15|1630000000.00000000|2021-08-26 17:46:40|
-------- ------------------- -------------------