Home > database >  Extract YYYY-MM-DD HH:MM: SS and convert to different time zone
Extract YYYY-MM-DD HH:MM: SS and convert to different time zone

Time:11-15

I am exploring different date formats and trying to convert date formats to others. Currently, I m stuck in a scenario where I have input dates and times as below: enter image description here

I was able to convert it to a date timestamp using concatenation

concat_ws(' ',new_df.transaction_date,new_df.Transaction_Time)

While I m trying to use withColumn("date_time2", F.to_date(col('date_time'), "MMM d yyyy hh:mmaa")) with ('spark.sql.legacy.timeParserPolicy','LEGACY')

It is displayed as 'undefined' enter image description here

I am looking for pointers/code snippets to extract YYYY-MM-DD HH:MM:SS in CET (input is in PST) as below

input_date_time output (in CET)
Mar 1, 2022 01:00:00 PM PST 2022-03-01 22:00:00

CodePudding user response:

Parse PST string to timestamp with timezone in UTC. Then convert to "CET" time:

import pyspark.sql.functions as F

df = spark.createDataFrame(data=[["Mar 1, 2022 01:00:00 PM PST"]], schema=["input_date_time_pst"])

df = df.withColumn("input_date_time_pst", F.to_timestamp("input_date_time_pst", format="MMM d, yyyy hh:mm:ss a z"))

df = df.withColumn("output_cet", F.from_utc_timestamp("input_date_time_pst", "CET"))

[Out]:
 ------------------- ------------------- 
|input_date_time_pst|output_cet         |
 ------------------- ------------------- 
|2022-03-01 21:00:00|2022-03-01 22:00:00|
 ------------------- ------------------- 

Note - The 2022-03-01 21:00:00 above is Mar 1, 2022 01:00:00 PM PST displayed in UTC.

  • Related