Home > Software design >  Date Time Conversions in PySpark
Date Time Conversions in PySpark

Time:11-20

Can someone please explain me how the below epoch time

  • epoch time/unix-timestamp :1668443121840
  • converts to the date : 2022-11-14T16:25:21.840 0000

How is the conversion taking place and additionally how to identify an epoch timestamp if it is mentioned in seconds, milliseconds, microseconds or nanoseconds?

Additionally, is there a function in pyspark to convert the date back to epoch timestamp?

Thanks! in advance.

I tried a number of methods but I am not achieving the expected result:

t = datetime.datetime.strptime('2021-11-12 02:12:23', '%Y-%m-%d %H:%M:%S')
print(t.strftime('%s'))

As I am not able to control the format or accuracy in terms of seconds, milliseconds, microseconds or nanoseconds.

CodePudding user response:

The epoch time/unix-timestamp uses a reference date: 00:00:00 UTC on 1 January 1970. It counts the seconds/milliseconds from that date.

The value you are looking for is in miliseconds, so you would have to calculate the milliseconds and concatenate with the epoch time:

import pyspark.sql.functions as F

df = spark.createDataFrame([('2022-11-14T16:25:21.840 0000',)]).toDF("timestamp")\


df\
.withColumn("timestamp",F.to_timestamp(F.col("timestamp")))\
.withColumn("epoch_seconds",F.unix_timestamp("timestamp"))\
.withColumn("epoch_miliseconds",F.concat(F.unix_timestamp("timestamp"), F.date_format("timestamp", "S")))\
.show(truncate=False)

#  ---------------------- ------------- ----------------- 
# |timestamp             |epoch_seconds|epoch_miliseconds|
#  ---------------------- ------------- ----------------- 
# |2022-11-14 16:25:21.84|1668443121   |16684431218      |
#  ---------------------- ------------- ----------------- 

CodePudding user response:

The unix timestamp counts the seconds that have elapsed since 00:00:00 UTC on 1 January 1970.

To convert dates to unix-timestamp in PySpark, you can use the unix_timestamp function.

  • Related