Home > Net >  Timestamp Timezone Wrong/Missing in Spark/Databricks SQL Output
Timestamp Timezone Wrong/Missing in Spark/Databricks SQL Output

Time:07-19

When converting a timestamp between timezones in databricks/spark sql, the timezone itself seems lost in the end result, and I can't seem to either keep it or add it back.

I have a bunch of UTC times and am using the Result of query

You can see that the third column has done the conversions correctly for the timezones, but the output itself still shows as being in UTC timezone.

Repeating this with a Query result with lowercase z in date_format

I can also use an O in the format output instead of a Z or z, but this just gives me GMT instead of UTC; same output basically.

All the Desired timestamp output format

Is there a way to do this? How do I either keep the timezone after the conversion or add it back in the format I need based on the timezone column I have? Given that the conversion works, and that I can output the end result with a 0000 on it, all the functionality to do this seems there, how do I put it together?

CodePudding user response:

Spark does not support TIMESTAMP WITH TIMEZONE datatype as defined by ANSI SQL. Even though there are some functions that convert the timestamp across timezones, this information is never stored. Databricks documentation on timestamps explains:

Spark SQL defines the timestamp type as TIMESTAMP WITH SESSION TIME ZONE, which is a combination of the fields (YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, SESSION TZ) where the YEAR through SECOND field identify a time instant in the UTC time zone, and where SESSION TZ is taken from the SQL config spark.sql.session.timeZone.

In your case spark.sql.session.timeZone is UTC and Z symbol in datetime pattern will always return UTC. Therefore you will never get a correct behavior with date_format if you deal with multiple timezones in a single query.

The only thing you can do is to explicitly store timezone information in a column and manually append it for display.

concat(
   date_format(from_utc_timestamp(createTimestampUTC, v.timezone), "yyyy-MM-dd'T'HH:mm:s "),
   v.timezone
) createTimestampLocal

This will display 2022-03-01T16:47:22.000 America/New_York. If you need an offset (-05:00) you will need to write a UDF to do the conversion and use Python or Scala native libraries that handle datetime conversions.

  • Related