Home > OS >  Pyspark - How do I convert date/timestamp of format like /Date(1593786688000 0200)/ in pyspark?
Pyspark - How do I convert date/timestamp of format like /Date(1593786688000 0200)/ in pyspark?

Time:01-29

I've a dataframe with CreateDate column with this format:

CreateDate
/Date(1593786688000 0200)/
/Date(1446032157000 0100)/
/Date(1533904635000 0200)/
/Date(1447839805000 0100)/
/Date(1589451249000 0200)/

and I want to convert that format to date/timestamp, so the excepted output will be:

CreateDate
2020-07-03 14:31:28  02:00
2015-10-28 11:35:57  01:00
2018-08-10 12:37:15  02:00
2015-11-18 09:43:25  01:00
2020-05-14 10:14:09  02:00

I have this query in SQL that gives the desired output and that can help to develop:

cast(convert(VARCHAR(30), DATEADD(Second, convert(BIGINT, left(replace(replace(CreateDate, '/date(', ''), ')/', ''), 13)) / 1000, '1970-01-01 00:00:00'), 20)   ' '   ' '   left(right(replace(replace(CreateDate, '/date(', ''), ')/', ''), 4), 2)   ':'   right(replace(replace(CreateDate, '/date(', ''), ')/', ''), 2) AS DATETIMEOFFSET(0)) AS CreateDate

Can anyone please help me in achieving this?

Thank you!

CodePudding user response:

Use regexp_extract to pull timestamp and time offset.

Then from_unixtime to convert seconds to timestamp. Note that from_unixtime expects seconds, while you timestamp is in milliseconds, that's why we leave out 000 (naively assuming that in your data this is indeed zeros in all of the records, otherwise you'll need to adjust this a bit).

val re = """/Date\((\d )000(\ \d )"""

df.withColumn("ts", from_unixtime(regexp_extract($"CreateDate", re, 1)))
  .withColumn("tz_offset", regexp_extract($"CreateDate", re, 2))
  .show(false)
 -------------------------- ------------------- --------- 
|CreateDate                |ts                 |tz_offset|
 -------------------------- ------------------- --------- 
|/Date(1593786688000 0200)/|2020-07-03 16:31:28| 0200    |
 -------------------------- ------------------- --------- 
  • Related