I've a dataframe with CreateDate column with this format:
CreateDate
/Date(1593786688000 0200)/
/Date(1446032157000 0100)/
/Date(1533904635000 0200)/
/Date(1447839805000 0100)/
/Date(1589451249000 0200)/
and I want to convert that format to date/timestamp, so the excepted output will be:
CreateDate
2020-07-03 14:31:28 02:00
2015-10-28 11:35:57 01:00
2018-08-10 12:37:15 02:00
2015-11-18 09:43:25 01:00
2020-05-14 10:14:09 02:00
I have this query in SQL that gives the desired output and that can help to develop:
cast(convert(VARCHAR(30), DATEADD(Second, convert(BIGINT, left(replace(replace(CreateDate, '/date(', ''), ')/', ''), 13)) / 1000, '1970-01-01 00:00:00'), 20) ' ' ' ' left(right(replace(replace(CreateDate, '/date(', ''), ')/', ''), 4), 2) ':' right(replace(replace(CreateDate, '/date(', ''), ')/', ''), 2) AS DATETIMEOFFSET(0)) AS CreateDate
Can anyone please help me in achieving this?
Thank you!
CodePudding user response:
Use regexp_extract
to pull timestamp and time offset.
Then from_unixtime
to convert seconds to timestamp. Note that from_unixtime
expects seconds, while you timestamp is in milliseconds, that's why we leave out 000
(naively assuming that in your data this is indeed zeros in all of the records, otherwise you'll need to adjust this a bit).
val re = """/Date\((\d )000(\ \d )"""
df.withColumn("ts", from_unixtime(regexp_extract($"CreateDate", re, 1)))
.withColumn("tz_offset", regexp_extract($"CreateDate", re, 2))
.show(false)
-------------------------- ------------------- ---------
|CreateDate |ts |tz_offset|
-------------------------- ------------------- ---------
|/Date(1593786688000 0200)/|2020-07-03 16:31:28| 0200 |
-------------------------- ------------------- ---------