Home > Software engineering >  Spark timestamp format with timezone issue
Spark timestamp format with timezone issue

Time:09-14

I have code:

timestampFormat="yyy-MM-dd'T'HH:mm:ssXXX"

or

timestampFormat="yyy-MM-dd'T'HH:mm:ssZZZZZ"
Dataset<Row> inputDataFrame = spark.read()
            .format("CSV")
            .option("timestampFormat", timestampFormat)
            .load(path/file);

The value 2022-04-05T08:19:00 00:00 is loaded into the hive table as 05.04.2022 10:19:00. There is 2 hours difference. It should be 05.04.2022 08:19:00. Can someone tell me what kind of format should I use?

CodePudding user response:

You can set spark sql session timezone like below & rerun the job.

--conf "spark.sql.session.timeZone=UTC" // Change it your timezone.

or

spark.conf.set("spark.sql.session.timeZone", "UTC") // Change it your timezone.

  • Related